How to Fetch Data from a Fast API in Your Spare Time

Ilya Novak
6 min readMar 5, 2021

Introduction

I am a data science student at Lambda School, an online vocational and training program focused on web development and data science. The final course in Lambda’s data science curriculum is called “Labs” which is focused on a project in which I worked with a dozen fellow students to develop an online application titled “CitySpire.”

The goal of Labs was to develop the CitySpire application that would provide its users productive information on any city in the United States to which they are considering moving. Users login into the application through their Okta credentials. Then they select a city in the United States and application returning useful city data like the following:

  1. Population and density
  2. Monthly rents going back several years, and a forecast of future rents
  3. Violent and property crime
  4. A “walkability” score
  5. A cost of living index

Our development team was split up into three groups: First, the Web team was responsible for developing the application’s online browser component. Second, the iOS team was responsible for developing the application’s iOS component. Third, my data science team was responsible for providing the first two teams with city data.

My initial fear was that I had no previous experience working with a team on a collaborative project of this kind. I neither had a sense of my capacity to work with others nor confidence that my knowledge of technology was up to the challenge.

Initial Challenges

Our first task was to set up an AWS Postgres Database. However, none of us on the data science team had any experience with Amazon Web Services. Therefore, each step of the process for setting up an AWS database was a significant challenge: accessing the AWS account with property credentials, initially creating the database, selecting its proper configurations, and accessing it from an external source.

Once the database was successfully set up, we next had to find useful city data to populate the database. On the one hand, all such data is conveniently provided by professional services that specialize in data collation. However, our team had no funds with which to purchase such services and were therefore forced to find data from scattered sources across the internet. But the main challenge was therefore merely to find datasets available to the public free of charge but to find datasets with a common foreign key (e.g., city name, county name, zip code, etc.) which could be used to map city data records across the datasets.

Team Responsibilities

Once the database was set up and populated with city data, my data science teammates and I then split up our assignments as follows.

  • Toby took responsibility for developing a Prophet model to forecast monthly rental prices per city twelve months into the future, given several years of past rental data. These monthly forecasts were added to the database
  • Stepan took responsibility for deploying a Fast API route onto AWS Elastic Beanstalk to be used by Web and iOS to fetch city data from our database
  • I (Ilya) took responsibility for developing the route’s function that, given a city identifier, would scour the database for all qualifying city data, combine it, and return it as a convenient JSON

Fetching data

The primary challenge was using foreign keys to map across datasets. My original plan had been for Web and iOS to provide my function with only a city ZIP code that would act as a convenient foreign key. However, the limitations of our scattered free datasets made this impossible because different datasets identified cities with different markers. I therefore required more information per city to successfully perform mapping. I convinced Web and iOS to provide us the following set of information per city in user searches:

  • City
  • State
  • ZIP Code
  • Latitude
  • Longitude

I used each of the above as a foreign key to map to one or more of our datasets. This forced me to write a distinct function for each dataset in our database. For example, here is the function to fetch data from our ‘population_size’ dataset in which I was forced to try multiple combinations of City, State, and Zip code to find a unique record from the table for the city.

I perform a similar procedure on each table and return their aggregate as a JSON. If I am unable to find data for city then I return None.

Here is screenshot from the Swagger UID of Stepan’s route that uses my function to fetch data from our database. The user enters some sample of information about his requested city, in this case for the city of Houston.

Executing this Post request returns the JSON with city data that is generated with my function. This information is then passed onto Web and iOS who display it in their applications for users.

la finite!

Unfortunately, the AWS database in which our city data is stored has been shut down. Users are able to sign up (with Okta), login, search for cities, and store cities as favorites. But the application will fail to return any city data because our database is down.

If our data science team decides to return to this product in order to fix these issues then we should make the following improvements, in order of priority:

  1. Set up our own database and deploy it to AWS or Heroku.
  2. Add a composite “city score” for each city that standardizes different measurements (population, crime, walkability, etc.) into a single numeric value
  3. Add additional data points per city like air pollution, commute time and quality of school

But the ideal long term solution is to scrap our existing data and switch to purchasing data services from professional firms that specialize in data collation. This would benefit us by providing us with information on a greater number of cities, higher quality (ie. more precise) information per city, a common foreign key across all datasets, and more types of data that are not freely available. This would probably involve using these firms’ APIs, thereby negating the need to set up our own database and to populate it with data.

Switching to such APIs would have the benefit of no longer needing to deal with the hassle of managing a database. But it would involve two technical challenges. First, my function for fetching data would have to be scrapped and completely rewritten to grab data from multiple APIs. Second, the Web and iOS teams would have to change the end points from which they receive the post requests.

But when I reflect on the past few months, my “Labs” experience has less to do with learning new technologies than learning how to work with people. The year long Lambda curriculum did an excellent job of introducing me to a wide variety of technologies and principles related to data science but it had little to do with working with people on common projects. In Labs I learned the following:

  • Clearly demarcating each team member’s responsibilities
  • Maintaining constant communication with my fellow data science team members and rapidly helping to solve each other’s troubles
  • Maintaining constant communication between the data science and Web/iOS teams. In particular, this involved establishing clear expectations of what kind of data each expects to receive and in what format and which end points
  • I did not merely learn how to use the technologies offered by Amazon Web Services, but how to effectively use them in collaboration with my fellow team members
  • Frankly, learning how to solve crises under pressure without panicking

But these lessons are not la finite, they are lo inizio!

--

--