A data analytics portfolio is a powerful tool for landing the interview. But how can you build one effectively?
Start with a data analytics project and build your portfolio around it. A data analytics project involves taking a dataset and analyzing it in a specific way to showcase results. Not only do they help you build your portfolio, but analytics projects also help you:
Python is a powerful tool for data analysis projects. Whether you’re web scraping data - on sites like the New York Times and Craigslist- or you’re conducting Exploratory Data Analysis (EDA) on Uber trips, here are three Python data analytics project ideas to try:
Todd W. Schneider’s Wedding Crunchers is a great example of a data analysis project using Python. Essentially, Todd scraped wedding announcements from the New York Times, and performed analysis on the data, finding interesting tidbits like:
Using the data and his analysis Schneider created a lot of cool visuals, like this:
How You Can Do It: Follow the example of Wedding Crunchers. Choose a news or media source, scrape titles and text, and analyze the data for trends. Here’s a tutorial for scraping news APIs with Python.
Craigslist is a great data source for an analytics project, and there are a wide range of things you can analyze. One of the most common listings are those for apartments.
Riley Predum created a handy tutorial that walks you through the steps of using Python and Beautiful Soup to scrape the data to pull apartment listings, and then was able to do some pretty cool analysis of pricing by neighborhood and price distributions. When graphed, his analysis looked like this:
How You Can Do It: Follow the tutorial to learn how to scrape the data using Python. Some analysis ideas: Look at apartment listings for another area, analyze used car prices for your market, or check out what used items sell on Craigslist.
Here’s an interesting project from Aman Kharwal: An analysis of Uber trip data from NYC. The project used this Kaggle dataset from FiveThirtyEight, containing nearly 20 million Uber pickups. There are a lot of angles to analyze this dataset, like popular pickup times or the busiest days of the week.
Here’s a data visualization on pickup times by hour of day from Aman:
How You Can Do It: This is a data analysis project idea if you’re prepping for a case study interview. You can emulate this one, using the dataset on Kaggle, or you can use these similar taxi and Uber datasets on data.world, including one for Austin, TX.
There’s a ton of accessible housing data online, e.g. sites like Zillow and Airbnb, and these datasets are perfect for analytics and EDA projects. If you’re interested in price trends in housing, market predictions, or just want to analyze the average home prices for a specific city or state, jump into these projects:
Check out Zillow’s free datasets. The Zillow Home Value Index (ZHVI) is a smoothed, seasonally adjusted average of housing market values by region and housing type. There are also datasets on rentals, housing inventories, and price forecasts.
Here’s an analytics project based in R that might give you some direction. The author analyzes Zillow data for Seattle, looking at things like the age of inventory (days since listing), % of homes that sell for a loss or gain, and list price vs. sale price for homes in the region:
How You Can Do It: There are a ton of different ways you can use the Zillow dataset. Examine listings by region, explore individual list price vs. sale price, or take a look at the average sale price over average list price by city.
On Inside Airbnb, you’ll find data from Airbnb that has been analyzed, cleaned, and aggregated. You’ll find data for dozens of cities around the world, including number of listings, calendars for listings, and reviews for listings.
Here’s a look at a project from Agratama Arfiano examining Airbnb data for Singapore. There’s a lot of different analysis you can do, including finding the number of listings by host or listings by neighborhood. Arfiano has produced some really great visualizations for this project, like the following:
How You Can Do It: Download the data from Inside Airbnb, then choose a city for analysis. You can look at the price, listings by area, listings by host, the average number of days a listing is rented, and much more.
Have you ever wondered which cars are the most rented? Curious how fares change by make and model? Check out the Cornell Car Rental Dataset on Kaggle. Kushlesh Kumar created the dataset, which features records on 6,000+ rental cars. There are a lot of interesting questions you can answer with this dataset: Fares by make and model, fares by city, inventory by city, and much more. Here’s a cool visualization from Kushlesh:
How You Can Do It: Using the dataset, you could analyze rental cars by make and model, a specific location, or analyze specific car manufacturers. Another option: Try a similar project with these datasets: Cash for Clunkers cars, Carvana sales data or used cars on eBay.
Sporting data is great fodder for analytics projects. There are so many free datasets available, and they are updated all the time. You might start with a project similar to the NBA data analytics project outlined first below, and further down we have other sports analytics projects to try:
Check out this NBA data analytics project from Jay at Interview Query. Jay analyzed data from Basketball Reference (a great source, by the way) to determine the impact of the 2-for-1 play in the NBA. The idea: In basketball, the 2-for-1 play refers to the strategy that at the end of a quarter, a team aims to shoot the ball with between 25 and 36 seconds on the clock. That way the team that shoots first has time for an additional play while the opposing team only gets one response. (You can see the source code on github).
The main metric he was looking for was the differential gain between the score just before the 2-for-1 shot and the score at the end of the quarter. Here’s a look at differential gain:
How You Can Do It: Read this tutorial on scraping Basketball Reference data. You can analyze in-game statistics, play career statistics, playoff performance, and much more. One option would be to analyze a player’s high school ranking vs. their success in the NBA. Or you could visualize a player’s career.
This is a great dataset for a sports analytics project. Featuring 35,000 medals awarded since 1896, there’s plenty of data to analyze, and it’s great for identifying performance trends by country and sport. Here’s an interesting visualization from Didem Erkan:
How You Can Do It: Check out the Olympics medals dataset. Angles you might take for analysis include: Medal count by country (as in this visualization), medal trends by country, e.g. how U.S. performance evolved during the 1900s, or even grouping countries by region to see how fortunes have risen or faded over time.
FiveThirtyEight is a wonderful source of sports data; they have NBA datasets, as well as data for the NFL and NHL. The site uses its Soccer Power Index (SPI) ratings for predictions and forecasts, but it’s also a good source for analysis and analytics projects. To get started, check out Gideon Karasek’s breakdown of working with the SPI data.
How You Can Do It: Check out the SPI data. Questions you might try to answer include: How has a team’s SPI changed over time, comparisons of SPI amongst various soccer leagues, and goals scored vs. goals predicted.
All of the datasets we’ve mentioned would make for amazing data visualization projects. To cap things off we are highlighting three more ideas for you to use as inspiration that potentially draw from your own experiences or interests!
Books are full of data, and you can create some really amazing visualizations using the patterns from them. Take a look at this project by Hanna Piotrowska, turning an Italo Calvo book into cool visualizations. The project features visualizations of word distributions, themes and motifs by chapter, and a visualization of the distribution of themes throughout the book:
How You Can Do It: This Shakespeare dataset, which features all of the lines from his plays, would be great for recreating this type of project. Another option: Create a visualization of your favorite Star Wars script.
This project by Jamie Kettle visualizes plastic pollution by country, and it does a scarily good job of showing just how much plastic waste enters the ocean each year. Take a look for inspiration:
How You Can Do It: There are dozens of pollution datasets on data.world. Choose one and create a visualization that shows the true impact of pollution on our natural environments.
There’s a ton of great movie and media datasets on Kaggle: The Movie Database 5000, Netflix Movies and TV Shows, Box Office Mojo data, etc. And just like their big-screen debuts, movie data makes for great visualizations. Take a look at this visualization of the Top 100 movies by Katie Silver, which features top movies based on box office gross and the Oscars each received:
How You Can Do It: Take a Kaggle movie dataset, and create a visualization that shows: Gross earnings vs. average IMDB rating, Netflix shows by rating, or visualization of top movies by studio.
If you are still looking for inspiration, see our compiled list of free datasets which features sites to search for free data, datasets for EDA projects and visualizations, as well as datasets for machine learning projects.