Interview Query

12 Data Analytics Project Ideas and Datasets

Overview

A data analytics portfolio is a powerful tool for landing the interview. But how can you build one effectively?

Start with a data analytics project and build your portfolio around it. A data analytics project involves taking a dataset and analyzing it in a specific way to showcase results. Not only do they help you build your portfolio, but analytics projects also help you:

  • Learn new tools and techniques
  • Work with complex datasets
  • Practice packaging your work and results
  • Prep for case study and take home interviews
  • Give you inbound interviews from hiring managers that have read your blog post!

Python Data Analytics Projects

Python is a powerful tool for data analysis projects. Whether you’re web scraping data - on sites like the New York Times and Craigslist- or you’re conducting Exploratory Data Analysis (EDA) on Uber trips, here are three Python data analytics project ideas to try:

1. Wedding Crunchers

Todd W. Schneider’s Wedding Crunchers is a great example of a data analysis project using Python. Essentially, Todd scraped wedding announcements from the New York Times, and performed analysis on the data, finding interesting tidbits like:

  • Distribution of common phrases
  • Average age trends of brides and grooms
  • Demographic trends

Using the data and his analysis Schneider created a lot of cool visuals, like this:

 NYT The Wedding Frequency

How You Can Do It: Follow the example of Wedding Crunchers. Choose a news or media source, scrape titles and text, and analyze the data for trends. Here’s a tutorial for scraping news APIs with Python.

2. Scraping Craigslist

Craigslist is a great data source for an analytics project, and there are a wide range of things you can analyze. One of the most common listings are those for apartments.

Riley Predum created a handy tutorial that walks you through the steps of using Python and Beautiful Soup to scrape the data to pull apartment listings, and then was able to do some pretty cool analysis of pricing by neighborhood and price distributions. When graphed, his analysis looked like this:

Scraping Craigslist

How You Can Do It: Follow the tutorial to learn how to scrape the data using Python. Some analysis ideas: Look at apartment listings for another area, analyze used car prices for your market, or check out what used items sell on Craigslist.

3. Uber Trip Analysis

Here’s an interesting project from Aman Kharwal: An analysis of Uber trip data from NYC. The project used this Kaggle dataset from FiveThirtyEight, containing nearly 20 million Uber pickups. There are a lot of angles to analyze this dataset, like popular pickup times or the busiest days of the week.

Here’s a data visualization on pickup times by hour of day from Aman:

Uber Trip Analysis

How You Can Do It: This is a data analysis project idea if you’re prepping for a case study interview. You can emulate this one, using the dataset on Kaggle, or you can use these similar taxi and Uber datasets on data.world, including one for Austin, TX.

Rental and Housing Data Analytics Project Ideas

There’s a ton of accessible housing data online, e.g. sites like Zillow and Airbnb, and these datasets are perfect for analytics and EDA projects. If you’re interested in price trends in housing, market predictions, or just want to analyze the average home prices for a specific city or state, jump into these projects:

1. Zillow Housing Prices

Check out Zillow’s free datasets. The Zillow Home Value Index (ZHVI) is a smoothed, seasonally adjusted average of housing market values by region and housing type. There are also datasets on rentals, housing inventories, and price forecasts.

Here’s an analytics project based in R that might give you some direction. The author analyzes Zillow data for Seattle, looking at things like the age of inventory (days since listing), % of homes that sell for a loss or gain, and list price vs. sale price for homes in the region:

Zillow Housing Prices

How You Can Do It: There are a ton of different ways you can use the Zillow dataset. Examine listings by region, explore individual list price vs. sale price, or take a look at the average sale price over average list price by city.

2. Inside Airbnb

On Inside Airbnb, you’ll find data from Airbnb that has been analyzed, cleaned, and aggregated. You’ll find data for dozens of cities around the world, including number of listings, calendars for listings, and reviews for listings.

Here’s a look at a project from Agratama Arfiano examining Airbnb data for Singapore. There’s a lot of different analysis you can do, including finding the number of listings by host or listings by neighborhood. Arfiano has produced some really great visualizations for this project, like the following:

Inside Airbnb

How You Can Do It: Download the data from Inside Airbnb, then choose a city for analysis. You can look at the price, listings by area, listings by host, the average number of days a listing is rented, and much more.

3. Car Rentals

Have you ever wondered which cars are the most rented? Curious how fares change by make and model? Check out the Cornell Car Rental Dataset on Kaggle. Kushlesh Kumar created the dataset, which features records on 6,000+ rental cars. There are a lot of interesting questions you can answer with this dataset: Fares by make and model, fares by city, inventory by city, and much more. Here’s a cool visualization from Kushlesh:

Car Rentals

How You Can Do It: Using the dataset, you could analyze rental cars by make and model, a specific location, or analyze specific car manufacturers. Another option: Try a similar project with these datasets: Cash for Clunkers cars, Carvana sales data or used cars on eBay.

Sports and NBA Data Analytics Projects

Sporting data is great fodder for analytics projects. There are so many free datasets available, and they are updated all the time. You might start with a project similar to the NBA data analytics project outlined first below, and further down we have other sports analytics projects to try:

1. NBA Data Analytics Project

Check out this NBA data analytics project from Jay at Interview Query. Jay analyzed data from Basketball Reference (a great source, by the way) to determine the impact of the 2-for-1 play in the NBA. The idea: In basketball, the 2-for-1 play refers to the strategy that at the end of a quarter, a team aims to shoot the ball with between 25 and 36 seconds on the clock. That way the team that shoots first has time for an additional play while the opposing team only gets one response. (You can see the source code on github).

The main metric he was looking for was the differential gain between the score just before the 2-for-1 shot and the score at the end of the quarter. Here’s a look at differential gain:

NBA Data Analytics Project

How You Can Do It: Read this tutorial on scraping Basketball Reference data. You can analyze in-game statistics, play career statistics, playoff performance, and much more. One option would be to analyze a player’s high school ranking vs. their success in the NBA. Or you could visualize a player’s career.

2. Olympic Medals Analysis

This is a great dataset for a sports analytics project. Featuring 35,000 medals awarded since 1896, there’s plenty of data to analyze, and it’s great for identifying performance trends by country and sport. Here’s an interesting visualization from Didem Erkan:

Olympic Medals Analysis

How You Can Do It: Check out the Olympics medals dataset. Angles you might take for analysis include: Medal count by country (as in this visualization), medal trends by country, e.g. how U.S. performance evolved during the 1900s, or even grouping countries by region to see how fortunes have risen or faded over time.

3. Soccer Power Rankings

FiveThirtyEight is a wonderful source of sports data; they have NBA datasets, as well as data for the NFL and NHL. The site uses its Soccer Power Index (SPI) ratings for predictions and forecasts, but it’s also a good source for analysis and analytics projects. To get started, check out Gideon Karasek’s breakdown of working with the SPI data.

Soccer Power Rankings

How You Can Do It: Check out the SPI data. Questions you might try to answer include: How has a team’s SPI changed over time, comparisons of SPI amongst various soccer leagues, and goals scored vs. goals predicted.

Data Visualization Projects

All of the datasets we’ve mentioned would make for amazing data visualization projects. To cap things off we are highlighting three more ideas for you to use as inspiration that potentially draw from your own experiences or interests!

1. Visualize Your Favorite Book

Books are full of data, and you can create some really amazing visualizations using the patterns from them. Take a look at this project by Hanna Piotrowska, turning an Italo Calvo book into cool visualizations. The project features visualizations of word distributions, themes and motifs by chapter, and a visualization of the distribution of themes throughout the book:

Visualize Your Favorite Book

How You Can Do It: This Shakespeare dataset, which features all of the lines from his plays, would be great for recreating this type of project. Another option: Create a visualization of your favorite Star Wars script.

2. Visualizing Pollution

This project by Jamie Kettle visualizes plastic pollution by country, and it does a scarily good job of showing just how much plastic waste enters the ocean each year. Take a look for inspiration:

Visualizing Pollution

How You Can Do It: There are dozens of pollution datasets on data.world. Choose one and create a visualization that shows the true impact of pollution on our natural environments.

3. Visualizing Top Movies

There’s a ton of great movie and media datasets on Kaggle: The Movie Database 5000, Netflix Movies and TV Shows, Box Office Mojo data, etc. And just like their big-screen debuts, movie data makes for great visualizations. Take a look at this visualization of the Top 100 movies by Katie Silver, which features top movies based on box office gross and the Oscars each received:

Visualizing Top Movies

How You Can Do It: Take a Kaggle movie dataset, and create a visualization that shows: Gross earnings vs. average IMDB rating, Netflix shows by rating, or visualization of top movies by studio.

More Analytics Project Resources

If you are still looking for inspiration, see our compiled list of free datasets which features sites to search for free data, datasets for EDA projects and visualizations, as well as datasets for machine learning projects.