26 Data Analytics Project Ideas and Datasets (2022)

26 Data Analytics Project Ideas and Datasets (2022)

Overview

Data analytics projects help you build your portfolio and land interviews. However, it’s not enough to just do an interesting analytics project. You also have to market your project to ensure it gets found.

The first step in starting any data analytics project is to come up with an interesting problem to investigate. Then, you need to find a dataset to analyze the problem. Some of the best categories for data analytics project ideas include:

A data analytics portfolio is a powerful tool for landing an interview. But how can you build one effectively?

Start with a data analytics project and build your portfolio around it. A data analytics project involves taking a dataset and analyzing it in a specific way to showcase results. Not only do they help you build your portfolio, but analytics projects also help you:

  • Learn new tools and techniques
  • Work with complex datasets
  • Practice packaging your work and results
  • Prep for a case study and take-home interviews
  • Give you inbound interviews from hiring managers that have read your blog post!

Python Data Analytics Projects

Python is a powerful tool for data analysis projects. Whether you’re web scraping data - on sites like the New York Times and Craigslist- or you’re conducting Exploratory Data Analysis (EDA) on Uber trips, here are three Python data analytics project ideas to try:

1. Enigma Transforming CSV file Take-Home

Enigma Take-Home Challenge

This take-home challenge - which requires 1-2.5 hours to complete - is a Python script writing task. You’re asked to write a script to transform input CSV data to desired output CSV data. A take-home like this is good practice for the type of Python take-homes that are asked of data analysts, data scientists, and data engineers.

As you work through this practice challenge, focus specifically on the grading criteria, which include:

  • How well do you solve the problems
  • The logic and approach you take to solving them
  • Your ability to produce, document, and comment on code
  • Ultimately, the ability to write clear and clean scripts for data preparation.

2. Wedding Crunchers

Todd W. Schneider’s Wedding Crunchers is a great example of a data analysis project using Python. Essentially, Todd scraped wedding announcements from the New York Times, and performed analysis on the data, finding interesting tidbits like:

  • Distribution of common phrases
  • Average age trends of brides and grooms
  • Demographic trends

Using the data and his analysis Schneider created a lot of cool visuals, like this:

Wedding Crunchers

How you can do it: Follow the example of Wedding Crunchers. Choose a news or media source, scrape titles and text, and analyze the data for trends. Here’s a tutorial for scraping news APIs with Python.

3. Scraping Craigslist

Craigslist is a great data source for an analytics project, and there is a wide range of things you can analyze. One of the most common listings is for apartments.

Riley Predum created a handy tutorial that walks you through the steps of using Python and Beautiful Soup to scrape the data to pull apartment listings, and then was able to do some pretty cool analysis of pricing by neighborhood and price distributions. When graphed, his analysis looked like this:

Scraping Craigslist

How you can do it: Follow the tutorial to learn how to scrape the data using Python. Some analysis ideas: Look at apartment listings for another area, analyze used car prices for your market, or check out what used items sell on Craigslist.

4. Uber Trip Analysis

Here’s an interesting project from Aman Kharwal: An analysis of Uber trip data from NYC. The project used this Kaggle dataset from FiveThirtyEight, containing nearly 20 million Uber pickups. There are a lot of angles to analyze this dataset, like popular pickup times or the busiest days of the week.

Here’s a data visualization on pickup times by hour of the day from Aman:

Uber Trip Analysis

How you can do it: This is a data analysis project idea if you’re prepping for a case study interview. You can emulate this one, using the dataset on Kaggle, or you can use these similar taxies and Uber datasets on data.world, including one for Austin, TX.

5. Twitter Sentiment Analysis

Twitter is the perfect data source for an analytics project, and you can perform a wide range of analyses based on Twitter datasets. Sentiment analysis projects are great for practicing beginner NLP techniques.

One option would be to measure sentiment in your dataset over time like this:

Twitter Sentiment Analysis data set

How you can do it: This tutorial from Natassha Selvaraj provides step-by-step instructions to do sentiment analysis in Twitter. Or see this tutorial from the Twitter developer forum. For data, you can scrape your own or pull some from these free datasets.

6. Home Pricing Predictions

This project was featured in our list of Python data science projects. With this project, you can take the classic California Census dataset, and use it to predict home prices by region, zip code, or details about the house.

Python can be used to produce some great visualizations, like this heat map of price by location:

Home Pricing Predictions

How you can do it: Because this dataset is so well known, there are a lot of helpful tutorials to learn how to predict price in Python. Then, once you’ve learned the technique, you can start practicing it on a variety of datasets like stock prices, used car prices, or airfare.

Rental and Housing Data Analytics Project Ideas

There’s a ton of accessible housing data online, e.g. sites like Zillow and Airbnb, and these datasets are perfect for analytics and EDA projects.

If you’re interested in price trends in housing, market predictions, or just want to analyze the average home prices for a specific city or state, jump into these projects:

7. Airbnb Data Analytics Take-Home Assignment

Airbnb Data Analytics Take-Home

  • Overview: Analyze the provided data and make product recommendations to help increase bookings in Rio de Janeiro.
  • Time Required: 6 hours
  • Skills Tested: Analytics, EDA, growth marketing, data visualization
  • Deliverable: S​ummarize your recommendations in response to the questions above in a Jupyter Notebook intended for the Head of Product and VP of Operations (who is not technical).

This take-home is a classic product case study. You have booking data for Rio de Janeiro, and you must define metrics for analyzing matching performance and make recommendations to help increase the number of bookings.

This take-home includes grading criteria, which can help direct your work. Assignments are judged on the following:

  • Analytical approach and clarity of visualizations
  • Your data sense and decision-making, as well as the reproducibility of the analysis
  • Strength of your recommendations
  • Your ability to communicate insights in your presentation
  • Your ability to follow directions

8. Zillow Housing Prices

Check out Zillow’s free datasets. The Zillow Home Value Index (ZHVI) is a smoothed, seasonally adjusted average of housing market values by region and housing type. There are also datasets on rentals, housing inventories, and price forecasts.

Here’s an analytics project based in R that might give you some direction. The author analyzes Zillow data for Seattle, looking at things like the age of inventory (days since listing), % of homes that sell for a loss or gain, and list price vs. sale price for homes in the region:

Zillow Housing Prices

How you can do it: There are a ton of different ways you can use the Zillow dataset. Examine listings by region, explore individual list price vs. sale price, or take a look at the average sale price over the average list price by city.

9. Inside Airbnb

On Inside Airbnb, you’ll find data from Airbnb that has been analyzed, cleaned, and aggregated. You’ll find data for dozens of cities around the world, including number of listings, calendars for listings, and reviews for listings.

Here’s a look at a project from Agratama Arfiano examining Airbnb data for Singapore. There are a lot of different analyses you can do, including finding the number of listings by host or listings by neighborhood. Arfiano has produced some really great visualizations for this project, like the following:

Inside Airbnb

How you can do it: Download the data from Inside Airbnb, then choose a city for analysis. You can look at the price, listings by area, listings by the host, the average number of days a listing is rented, and much more.

10. Car Rentals

Have you ever wondered which cars are the most rented? Curious how fares change by make and model? Check out the Cornell Car Rental Dataset on Kaggle. Kushlesh Kumar created the dataset, which features records on 6,000+ rental cars. There are a lot of interesting questions you can answer with this dataset: Fares by make and model, fares by city, inventory by city, and much more. Here’s a cool visualization from Kushlesh:

Car Rentals

How you can do it: Using the dataset, you could analyze rental cars by make and model, a specific location, or analyze specific car manufacturers. Another option: Try a similar project with these datasets: Cash for Clunkers cars, Carvana sales data or used cars on eBay.

11. Analyzing NYC Property Sales

This real estate dataset shows every property that sold in New York City between September 2016 and September 2017. You can use this data (or a similar dataset you create) for a number of projects, including EDA, price predictions, regression analysis, and data cleaning.

A beginner analytics project you can try would with this data would be a missing values analysis project like:

NYC real estate dataset

How you can do it: There are a ton of helpful Kaggle notebooks you can browse to learn how to: perform price predictions, do data cleaning tasks, or do some interesting EDA with this dataset.

Sports and NBA Data Analytics Projects

Sports data analytics projects are fun if you’re a fan, and also, because there are numerous free data sources available like Pro-Football-Reference and Basketball-Reference. These sources allow you to pull numerous statistics and build your own unique dataset to investigate a problem.

12. NBA Data Analytics Project

Check out this NBA data analytics project from Jay at Interview Query. Jay analyzed data from Basketball Reference (a great source, by the way) to determine the impact of the 2-for-1 play in the NBA. The idea: In basketball, the 2-for-1 play refers to the strategy that at the end of a quarter, a team aims to shoot the ball with between 25 and 36 seconds on the clock. That way the team that shoots first has time for an additional play while the opposing team only gets one response. (You can see the source code on GitHub).

The main metric he was looking for was the differential gain between the score just before the 2-for-1 shot and the score at the end of the quarter. Here’s a look at a differential gain:

NBA Data Analytics Project

How you can do it: Read this tutorial on scraping Basketball Reference data. You can analyze in-game statistics, play career statistics, playoff performance, and much more. One option would be to analyze a player’s high school ranking vs. their success in the NBA. Or you could visualize a player’s career.

13. Olympic Medals Analysis

This is a great dataset for a sports analytics project. Featuring 35,000 medals awarded since 1896, there’s plenty of data to analyze, and it’s great for identifying performance trends by country and sport. Here’s an interesting visualization from Didem Erkan:

Olympic Medals Analysis

How you can do it: Check out the Olympics medals dataset. Angles you might take for analysis include: Medal count by country (as in this visualization), medal trends by country, e.g. how U.S. performance evolved during the 1900s, or even grouping countries by region to see how fortunes have risen or faded over time.

14. Soccer Power Rankings

FiveThirtyEight is a wonderful source of sports data; they have NBA datasets, as well as data for the NFL and NHL. The site uses its Soccer Power Index (SPI) ratings for predictions and forecasts, but it’s also a good source for analysis and analytics projects. To get started, check out Gideon Karasek’s breakdown of working with the SPI data.

Soccer Power Rankings

How you can do it: Check out the SPI data. Questions you might try to answer include: How has a team’s SPI changed over time, comparisons of SPI amongst various soccer leagues, and goals scored vs. goals predicted?

15. Home Field Advantage Analysis

Does home-field advantage matter in the NFL? Can you quantify how much it matters? First, gather data from Pro-Football-Reference.com. Then you can perform a simple linear regression model to measure the impact.

Home Field Advantage Analysis

There are a ton of projects you can do with NFL data. One would be to determine WR rankings, based on season performance.

How you can do it: See this Github repository on performing a linear regression to quantify home field advantage.

16. Daily Fantasy Sports

Creating a model to perform in daily fantasy sports requires you to:

  • Predict which players will perform best based on matchups, locations, and other indicators
  • Build a roster based on a “salary cap” budget
  • Determine which players will have the top ROI during the given week

If you’re interested in fantasy football, basketball, or baseball, this would be a great project.

Daily Fantasy Sports

How you can do it: Check out the Daily Fantasy Data Science course, if you want a step-by-step look.

Data Visualization Projects

All of the datasets we’ve mentioned would make for amazing data visualization projects. To cap things off we are highlighting three more ideas for you to use as inspiration that potentially draws from your own experiences or interests!

17. Supercell Data Scientist Pre-Test

Supercell Take-Home Challenge

This is a classic SQL/data analytics take-home. You’re asked to explore, analyze, visualize and model Supercell’s revenue data. Specifically, the dataset contains user data and transactions tied to user accounts.

You must answer questions about the data, like which countries produce the most revenue. Then, you’re asked to create a visualization of the data, as well as apply machine learning techniques to it.

18. Visualize Your Favorite Book

Books are full of data, and you can create some really amazing visualizations using the patterns from them. Take a look at this project by Hanna Piotrowska, turning an Italo Calvo book into cool visualizations. The project features visualizations of word distributions, themes and motifs by chapter, and a visualization of the distribution of themes throughout the book:

Visualize Your Favorite Book

How you can do it: This Shakespeare dataset, which features all of the lines from his plays, would be great for recreating this type of project. Another option: Create a visualization of your favorite Star Wars script.

19. Visualizing Pollution

This project by Jamie Kettle visualizes plastic pollution by country, and it does a scarily good job of showing just how much plastic waste enters the ocean each year. Take a look for inspiration:

Visualizing Pollution

How you can do it: There are dozens of pollution datasets on data.world. Choose one and create a visualization that shows the true impact of pollution on our natural environments.

20. Visualizing Top Movies

There are a ton of great movie and media datasets on Kaggle: The Movie Database 5000, Netflix Movies and TV Shows, Box Office Mojo data, etc. And just like their big-screen debuts, movie data makes for great visualizations.

Take a look at this visualization of the Top 100 movies by Katie Silver, which features top movies based on box office gross and the Oscars each received:

Visualizing Top Movies

How you can do it: Take a Kaggle movie dataset, and create a visualization that shows: Gross earnings vs. average IMDB rating, Netflix shows by rating, or visualization of top movies by the studio.

21. Gender Pay Gap Analysis

Salary is a subject everyone is interested in and it makes a great subject for visualization. One idea: Take this dataset from the U.S. Bureau of Labor Statistics, and create a visualization looking at the gap in pay by industry.

You can see an example of a gender pay gap visualization on InformationIsBeautiful.net:

Gender Pay Gap Analysis

How you can do it: You can re-create the gender pay visualization, and add your own spin. Or use salary data to visualize, fields with the fastest growing salaries, salary differences by cities, or data science salaries by the company.

Beginner Data Analytics Projects

Projects are one of the best ways for beginners to practice data science skills, including visualization, data cleaning, and working with tools like Python and pandas.

22. Relax Predicting User Adoption Take-Home

Relax Take-Home Assignment

This data analytics take-home assignment, which has been given to data analysts and data scientists at Relax Inc., asks you to dig into user engagement data. Specifically, you’re asked to determine who an “adopted user” is, which is a user who has logged into the product on three separate days in at least one seven-day period.

Once you’ve identified adopted users, you’re asked to surface factors that predict future user adoption.

How you can do it: Jump into the Relax take-home data. This is an intensive data analytics take-home challenge, which the company suggests you spend 12 hours on (although you’re welcome to spend more or less). This is a great project for practicing your data analytics EDA skills, as well as surfacing predictive insights from a dataset.

23. Data Cleaning Practice

This Kaggle Challenge asks you to clean data, and perform a variety of data cleaning tasks. This is a great beginner data analytics project, that will provide hands-on experience performing techniques like handling missing values, scaling and normalization, and parsing dates.

Data Cleaning Practice

How you can do it: You can work through this Kaggle Challenge, which includes data. Another option, however, would be to choose your own dataset that needs to be cleaned, and then work through the challenge and adapt the techniques to your own dataset.

24. Skilledup Messy Product Data Analysis Take-Home

SkilledUp Take-Home Challenge

This data analytics take-home from Skilledup, asks participants to perform analysis on a dataset of product details that is formatted inconveniently. This challenge provides an opportunity to show your data cleaning skills, as well as your ability to perform EDA and surface insights from an unfamiliar dataset. Specifically, the assignment asks you to consider one product group, named Books.

Each product in the group is associated with categories. Of course, there are tradeoffs to categorization, and you’re asked to consider these questions:

  • Is there redundancy in the categorization?
  • How can redundancy be identified and removed?
  • Is it possible to reduce the number of categories dramatically by sacrificing relatively few category entries?

How you can do it: You can access this EDA takehome on Interview Query. Open the dataset and perform some EDA to familiarize yourself with the categories. Then, you can begin to consider the questions that are posed.

25. Marketing Analytics Exploratory Data Analysis

This marketing analytics dataset on Kaggle includes customer profiles, campaign successes and failures, channel performance, and product preferences. It’s a great tool for diving into marketing analytics, and there are a number of questions you can answer from the data like:

  • What factors are significantly related to the number of store purchases?
  • Is there a significant relationship between geographical regional and success of a campaign?
  • How does the US compare to the rest of the world in terms of total purchases?

marketing analytics dataset

How you can do it: This Kaggle Notebook from user Jennifer Crockett is a great place to start, which includes a lot of great visualizations and analyses (like the one above).

If you want to take it a step further, there’s a lot of statistical analysis you can perform as well.

26. UFO Sightings Data Analysis

The UFO Sightings dataset is a fun one to dive into, and it contains data from more than 80,000 sightings over the last 100 years. This is a great source for a beginner EDA project, and you can draw a lot of insights out like where sightings are reported most frequently sightings in the US vs the world, and more.

UFO Sightings Data Analysis

How you can do it: Jump into the dataset on Kaggle. There are a number of notebooks you can check out with helpful code snippets. If you’re looking for a challenge, one user created an interactive map with sighting data.

More Analytics Project Resources

If you are still looking for inspiration, see our compiled list of free datasets which features sites to search for free data, datasets for EDA projects and visualizations, as well as datasets for machine learning projects.

You should also read our guide on the data analyst career path, how to build a data science project from scratch and list of 30 data science project ideas.