Data analytics projects help you build your portfolio and land interviews. However, it’s not enough to just do an interesting analytics project. You also have to market your project to ensure it gets found.
The first step in starting any data analytics project is to come up with an interesting problem to investigate. Then, you need to find a dataset to analyze the problem. Some of the best categories for data analytics project ideas include:
A data analytics portfolio is a powerful tool for landing an interview. But how can you build one effectively?
Start with a data analytics project and build your portfolio around it. A data analytics project involves taking a dataset and analyzing it in a specific way to showcase results. Not only do they help you build your portfolio, but analytics projects also help you:
Python is a powerful tool for data analysis projects. Whether you’re web scraping data - on sites like the New York Times and Craigslist- or you’re conducting Exploratory Data Analysis (EDA) on Uber trips, here are three Python data analytics project ideas to try:
This take-home challenge - which requires 1-2.5 hours to complete - is a Python script writing task. You’re asked to write a script to transform input CSV data to desired output CSV data. A take-home like this is good practice for the type of Python take-homes that are asked of data analysts, data scientists, and data engineers.
As you work through this practice challenge, focus specifically on the grading criteria, which include:
Todd W. Schneider’s Wedding Crunchers is a great example of a data analysis project using Python. Essentially, Todd scraped wedding announcements from the New York Times, and performed analysis on the data, finding interesting tidbits like:
Using the data and his analysis Schneider created a lot of cool visuals, like this:
How you can do it: Follow the example of Wedding Crunchers. Choose a news or media source, scrape titles and text, and analyze the data for trends. Here’s a tutorial for scraping news APIs with Python.
Craigslist is a great data source for an analytics project, and there is a wide range of things you can analyze. One of the most common listings is for apartments.
Riley Predum created a handy tutorial that walks you through the steps of using Python and Beautiful Soup to scrape the data to pull apartment listings, and then was able to do some pretty cool analysis of pricing by neighborhood and price distributions. When graphed, his analysis looked like this:
How you can do it: Follow the tutorial to learn how to scrape the data using Python. Some analysis ideas: Look at apartment listings for another area, analyze used car prices for your market, or check out what used items sell on Craigslist.
Here’s an interesting project from Aman Kharwal: An analysis of Uber trip data from NYC. The project used this Kaggle dataset from FiveThirtyEight, containing nearly 20 million Uber pickups. There are a lot of angles to analyze this dataset, like popular pickup times or the busiest days of the week.
Here’s a data visualization on pickup times by hour of the day from Aman:
How you can do it: This is a data analysis project idea if you’re prepping for a case study interview. You can emulate this one, using the dataset on Kaggle, or you can use these similar taxies and Uber datasets on data.world, including one for Austin, TX.
Twitter is the perfect data source for an analytics project, and you can perform a wide range of analyses based on Twitter datasets. Sentiment analysis projects are great for practicing beginner NLP techniques.
One option would be to measure sentiment in your dataset over time like this:
How you can do it: This tutorial from Natassha Selvaraj provides step-by-step instructions to do sentiment analysis in Twitter. Or see this tutorial from the Twitter developer forum. For data, you can scrape your own or pull some from these free datasets.
This project was featured in our list of Python data science projects. With this project, you can take the classic California Census dataset, and use it to predict home prices by region, zip code, or details about the house.
Python can be used to produce some great visualizations, like this heat map of price by location:
How you can do it: Because this dataset is so well known, there are a lot of helpful tutorials to learn how to predict price in Python. Then, once you’ve learned the technique, you can start practicing it on a variety of datasets like stock prices, used car prices, or airfare.
There’s a ton of accessible housing data online, e.g. sites like Zillow and Airbnb, and these datasets are perfect for analytics and EDA projects.
If you’re interested in price trends in housing, market predictions, or just want to analyze the average home prices for a specific city or state, jump into these projects:
This take-home is a classic product case study. You have booking data for Rio de Janeiro, and you must define metrics for analyzing matching performance and make recommendations to help increase the number of bookings.
This take-home includes grading criteria, which can help direct your work. Assignments are judged on the following:
Check out Zillow’s free datasets. The Zillow Home Value Index (ZHVI) is a smoothed, seasonally adjusted average of housing market values by region and housing type. There are also datasets on rentals, housing inventories, and price forecasts.
Here’s an analytics project based in R that might give you some direction. The author analyzes Zillow data for Seattle, looking at things like the age of inventory (days since listing), % of homes that sell for a loss or gain, and list price vs. sale price for homes in the region:
How you can do it: There are a ton of different ways you can use the Zillow dataset. Examine listings by region, explore individual list price vs. sale price, or take a look at the average sale price over the average list price by city.
On Inside Airbnb, you’ll find data from Airbnb that has been analyzed, cleaned, and aggregated. You’ll find data for dozens of cities around the world, including number of listings, calendars for listings, and reviews for listings.
Here’s a look at a project from Agratama Arfiano examining Airbnb data for Singapore. There are a lot of different analyses you can do, including finding the number of listings by host or listings by neighborhood. Arfiano has produced some really great visualizations for this project, like the following:
How you can do it: Download the data from Inside Airbnb, then choose a city for analysis. You can look at the price, listings by area, listings by the host, the average number of days a listing is rented, and much more.
Have you ever wondered which cars are the most rented? Curious how fares change by make and model? Check out the Cornell Car Rental Dataset on Kaggle. Kushlesh Kumar created the dataset, which features records on 6,000+ rental cars. There are a lot of interesting questions you can answer with this dataset: Fares by make and model, fares by city, inventory by city, and much more. Here’s a cool visualization from Kushlesh:
How you can do it: Using the dataset, you could analyze rental cars by make and model, a specific location, or analyze specific car manufacturers. Another option: Try a similar project with these datasets: Cash for Clunkers cars, Carvana sales data or used cars on eBay.
This real estate dataset shows every property that sold in New York City between September 2016 and September 2017. You can use this data (or a similar dataset you create) for a number of projects, including EDA, price predictions, regression analysis, and data cleaning.
A beginner analytics project you can try would with this data would be a missing values analysis project like:
How you can do it: There are a ton of helpful Kaggle notebooks you can browse to learn how to: perform price predictions, do data cleaning tasks, or do some interesting EDA with this dataset.
Sports data analytics projects are fun if you’re a fan, and also, because there are numerous free data sources available like Pro-Football-Reference and Basketball-Reference. These sources allow you to pull numerous statistics and build your own unique dataset to investigate a problem.
Check out this NBA data analytics project from Jay at Interview Query. Jay analyzed data from Basketball Reference (a great source, by the way) to determine the impact of the 2-for-1 play in the NBA. The idea: In basketball, the 2-for-1 play refers to the strategy that at the end of a quarter, a team aims to shoot the ball with between 25 and 36 seconds on the clock. That way the team that shoots first has time for an additional play while the opposing team only gets one response. (You can see the source code on GitHub).
The main metric he was looking for was the differential gain between the score just before the 2-for-1 shot and the score at the end of the quarter. Here’s a look at a differential gain:
How you can do it: Read this tutorial on scraping Basketball Reference data. You can analyze in-game statistics, play career statistics, playoff performance, and much more. One option would be to analyze a player’s high school ranking vs. their success in the NBA. Or you could visualize a player’s career.
This is a great dataset for a sports analytics project. Featuring 35,000 medals awarded since 1896, there’s plenty of data to analyze, and it’s great for identifying performance trends by country and sport. Here’s an interesting visualization from Didem Erkan:
How you can do it: Check out the Olympics medals dataset. Angles you might take for analysis include: Medal count by country (as in this visualization), medal trends by country, e.g. how U.S. performance evolved during the 1900s, or even grouping countries by region to see how fortunes have risen or faded over time.
FiveThirtyEight is a wonderful source of sports data; they have NBA datasets, as well as data for the NFL and NHL. The site uses its Soccer Power Index (SPI) ratings for predictions and forecasts, but it’s also a good source for analysis and analytics projects. To get started, check out Gideon Karasek’s breakdown of working with the SPI data.
How you can do it: Check out the SPI data. Questions you might try to answer include: How has a team’s SPI changed over time, comparisons of SPI amongst various soccer leagues, and goals scored vs. goals predicted?
Does home-field advantage matter in the NFL? Can you quantify how much it matters? First, gather data from Pro-Football-Reference.com. Then you can perform a simple linear regression model to measure the impact.
There are a ton of projects you can do with NFL data. One would be to determine WR rankings, based on season performance.
How you can do it: See this Github repository on performing a linear regression to quantify home field advantage.
Creating a model to perform in daily fantasy sports requires you to:
If you’re interested in fantasy football, basketball, or baseball, this would be a great project.
How you can do it: Check out the Daily Fantasy Data Science course, if you want a step-by-step look.
All of the datasets we’ve mentioned would make for amazing data visualization projects. To cap things off we are highlighting three more ideas for you to use as inspiration that potentially draws from your own experiences or interests!
This is a classic SQL/data analytics take-home. You’re asked to explore, analyze, visualize and model Supercell’s revenue data. Specifically, the dataset contains user data and transactions tied to user accounts.
You must answer questions about the data, like which countries produce the most revenue. Then, you’re asked to create a visualization of the data, as well as apply machine learning techniques to it.
Books are full of data, and you can create some really amazing visualizations using the patterns from them. Take a look at this project by Hanna Piotrowska, turning an Italo Calvo book into cool visualizations. The project features visualizations of word distributions, themes and motifs by chapter, and a visualization of the distribution of themes throughout the book:
How you can do it: This Shakespeare dataset, which features all of the lines from his plays, would be great for recreating this type of project. Another option: Create a visualization of your favorite Star Wars script.
This project by Jamie Kettle visualizes plastic pollution by country, and it does a scarily good job of showing just how much plastic waste enters the ocean each year. Take a look for inspiration:
How you can do it: There are dozens of pollution datasets on data.world. Choose one and create a visualization that shows the true impact of pollution on our natural environments.
There are a ton of great movie and media datasets on Kaggle: The Movie Database 5000, Netflix Movies and TV Shows, Box Office Mojo data, etc. And just like their big-screen debuts, movie data makes for great visualizations.
Take a look at this visualization of the Top 100 movies by Katie Silver, which features top movies based on box office gross and the Oscars each received:
How you can do it: Take a Kaggle movie dataset, and create a visualization that shows: Gross earnings vs. average IMDB rating, Netflix shows by rating, or visualization of top movies by the studio.
Salary is a subject everyone is interested in and it makes a great subject for visualization. One idea: Take this dataset from the U.S. Bureau of Labor Statistics, and create a visualization looking at the gap in pay by industry.
You can see an example of a gender pay gap visualization on InformationIsBeautiful.net:
How you can do it: You can re-create the gender pay visualization, and add your own spin. Or use salary data to visualize, fields with the fastest growing salaries, salary differences by cities, or data science salaries by the company.
Projects are one of the best ways for beginners to practice data science skills, including visualization, data cleaning, and working with tools like Python and pandas.
This data analytics take-home assignment, which has been given to data analysts and data scientists at Relax Inc., asks you to dig into user engagement data. Specifically, you’re asked to determine who an “adopted user” is, which is a user who has logged into the product on three separate days in at least one seven-day period.
Once you’ve identified adopted users, you’re asked to surface factors that predict future user adoption.
How you can do it: Jump into the Relax take-home data. This is an intensive data analytics take-home challenge, which the company suggests you spend 12 hours on (although you’re welcome to spend more or less). This is a great project for practicing your data analytics EDA skills, as well as surfacing predictive insights from a dataset.
This Kaggle Challenge asks you to clean data, and perform a variety of data cleaning tasks. This is a great beginner data analytics project, that will provide hands-on experience performing techniques like handling missing values, scaling and normalization, and parsing dates.
How you can do it: You can work through this Kaggle Challenge, which includes data. Another option, however, would be to choose your own dataset that needs to be cleaned, and then work through the challenge and adapt the techniques to your own dataset.
This data analytics take-home from Skilledup, asks participants to perform analysis on a dataset of product details that is formatted inconveniently. This challenge provides an opportunity to show your data cleaning skills, as well as your ability to perform EDA and surface insights from an unfamiliar dataset. Specifically, the assignment asks you to consider one product group, named Books.
Each product in the group is associated with categories. Of course, there are tradeoffs to categorization, and you’re asked to consider these questions:
How you can do it: You can access this EDA takehome on Interview Query. Open the dataset and perform some EDA to familiarize yourself with the categories. Then, you can begin to consider the questions that are posed.
This marketing analytics dataset on Kaggle includes customer profiles, campaign successes and failures, channel performance, and product preferences. It’s a great tool for diving into marketing analytics, and there are a number of questions you can answer from the data like:
How you can do it: This Kaggle Notebook from user Jennifer Crockett is a great place to start, which includes a lot of great visualizations and analyses (like the one above).
If you want to take it a step further, there’s a lot of statistical analysis you can perform as well.
The UFO Sightings dataset is a fun one to dive into, and it contains data from more than 80,000 sightings over the last 100 years. This is a great source for a beginner EDA project, and you can draw a lot of insights out like where sightings are reported most frequently sightings in the US vs the world, and more.
How you can do it: Jump into the dataset on Kaggle. There are a number of notebooks you can check out with helpful code snippets. If you’re looking for a challenge, one user created an interactive map with sighting data.
If you are still looking for inspiration, see our compiled list of free datasets which features sites to search for free data, datasets for EDA projects and visualizations, as well as datasets for machine learning projects.
You should also read our guide on the data analyst career path, how to build a data science project from scratch and list of 30 data science project ideas.