11 Supply Chain Projects and Datasets

11 Supply Chain Projects and Datasets


Global supply chain management is undergoing a significant transformation enabled by Big Data and powered by data science teams using advanced technologies like artificial intelligence, blockchain, and robotics.

These promising advancements promise to make the supply chain more agile, predictable, and cost-efficient for organizations, leading to higher profitability and competitiveness. With more organizations realizing the benefits of applying data science to supply chain management, the industry’s demand for skilled data scientists is rising. The Robert Half Technology (RHT) 2020 Salary Guide recognizes the manufacturing and distribution industries as “hotbeds for hiring” information technology professionals, including data scientists. Want to be part of the supply chain digitization revolution? Interesting in learning and applying the skills that will land you a data science job in the new Supply Chain 4.0?

Here are 9 Supply chain projects and datasets to help you start:

CleanSpark: Electricity Pricing Model Take-Home

CleanSpark Take-Home Challenge

This Python data science challenge is given to junior data scientists and data analysts. The task focuses on building a pricing model to estimate the energy and demand charges.

This is an in-depth Python challenge that includes packages you may not be familiar with. Specifically, you’re asked to:

  • Process monthly electricity consumption data for a commercial facility in San Diego.
  • Then, you’re required to add logic to the two stubbed functions (included with the test) that use electricity consumption data and the pricing scheme to calculate pricing.

Production Fixed Horizon Planning with Python

Production Fixed Horizon Planning with Python visualization

According to the latest report by Hackett Group, 81% of supply chain managers report that data analytics and Optimization will be crucial when it comes to cost reduction and supply chain management.

This project is a great way to apply optimization algorithms to real-life supply chain scenarios. We assume You are a production planning manager in a small factory producing radio equipment that serves local and international markets.

Your role is to schedule production to deliver on time with a minimum total cost of production while minimizing the storage costs and the production steps. You will learn to implement Wagner-Whitin Optimization with Python to solve that issue.

How you can do it: Follow this explanatory tutorial by Samir Saci that takes you through the solution steps and explains the logic of choosing Optimization and its application. You can find the source code for the project in his GitHub repository.

Amazon: Inventory Time Series Take-Home

Amazon Take-Home Challenge

This Amazon supply chain take-home asks you to analyze sales data to determine which percentage of customers do not wait for items to become available after they’ve been out of stock. Ideally, your model will be able to estimate lost sales due to shortage (which one indicator might be a higher demand on replenishment days).

To help you get started, these assumptions are included:

  • Demand has no trend or seasonality
  • Customers experiencing shortage that decide to come back would only come back on the day of replenishment.
  • Replenishment occurs at the beginning of day only.
  • Each customer buys no more than 1 item

A take-home challenge like this might be given to a supply chain analyst or sales analyst at Amazon or another e-commerce company.

Supply Chain Sustainability Reporting with Python

Supply Chain Sustainability Reporting with Python visualization

The demand for transparency in sustainable development from investors and customers has grown. Investors have placed an increased emphasis on the business’s sustainability when assessing an organization’s value and resiliency. In this project, we will use a straightforward methodology to report the CO2 emissions of your Distribution Network using Python and PowerBI.

How you can do it: Follow through this detailed tutorial by Samir Saci, which breaks down the project into three steps:

  • Step 1: Calculating CO2 Emissions Formula for Transportation using GHG Protocol corporate standard
  • Step 2: Processing data with Python
  • Step 3: Visualizing the results using PowerBI

You can find the project source code here.

Robust Supply Chain Networks with Monte Carlo Simulation

Robust Supply Chain Networks with Monte Carlo Simulation

Businesses with optimal supply chains achieve 5-15% lower supply chain costs, 20-50% fewer inventory holdings, and up to 3X cash-to-cash cycle speeds. And supply chain optimization algorithms make the best use of data analytics to help businesses achieve that goal by finding an optimal combination of factories and distribution centers to meet the demand of their customers.

The core structure of many software and solutions in the market is a Linear Programming Model.

But these models mostly assume constant demand. But if you have a fluctuating market or very high seasonality, using these solutions will affect your network robustness immensely. In this project, we will build a straightforward methodology to design a Robust Supply Chain Network using Monte Carlo simulation with Python to address this problem. We break the process down into three steps:

  1. Using supply and demand networks assumptions, we will start by analyzing the robustness of the initial solution assuming constant demand,
  2. Then run 50 simulations based on normal distributions,
  3. Then analyze the results to select the most robust solution or combination of solutions to increase robustness.

How you can do it: You can find a detailed paper in science direct laying out the approach and implementation and an example source code to try it out yourself on GitHub.

Improve Warehouse Productivity using Order Batching with Python

Improve Warehouse Productivity using Order Batching with Python visualization

In a Distribution Center (DC), walking time from one location to another during the picking route can account for 50% of the operator’s working time. Reducing this walking time is a very effective way to increase your DC overall productivity.

We will learn how to use the Single Picker Routing Problem (SPRP) to design a model that can be used to find optimal order picking in a two-dimensional warehouse.

We will test numerous optimization algorithms and then run different simulations to choose a winning picking strategy.

How you can do it: You can follow this tutorial by Samir Saci, explaining the steps of the project. To produce more optimal results, you can experiment with another approach to this problem using Spatial Clustering or Pathfinding Algorithm.

Production Planning and Resource Management of Manufacturing Systems in Python

Production Planning and Resource Management of Manufacturing Systems in Python

Supply chain management is one of the most significant areas of focus for improvement in today’s global economy, with supply chain management positions and hiring growing faster than current economic averages. Getting goods from destination A to destination B is challenging, but manufacturing enough materials is arduous.

Modeling these manufacturing processes requires a well-developed understanding of the constraints and dependencies inherent to the production line. Mixed-integer programming models such as CPLEX or Google’s OR framework derive solutions optimizing functions such as minimizing costs, but it struggles to model continuous systems. In this project, we will develop a rudimentary production plan with resource balancing in a forward-facing heuristic with Python and graphically illustrate an interactive Gantt chart using Plotly. The goal of our program will be to schedule the batch with the shortest runtime.

How you can do it: Follow the steps of this tutorial by Will Keefe, which explains a case study of biologics manufacturing within pharmaceuticals with clear steps and example code.

Multi-modal Transportation Optimization

Multi-modal Transportation Optimization visualization

Many transportation tools such as trucks, airplanes, and ships are available in delivery services, and different choices of routes and transportation tools will lead to additional costs.

To minimize cost, we should consider goods consolidation (Occasions when various goods share a journey.), transportation costs and delivery time constraints, etc. Shippers can minimize freight invoice payments by 90-95% if they optimize transportation management. (GlobalTranz, 2018).

This project uses mathematical programming to model such a situation and solves the overall cost-minimization solution. The model construction offers options for two mathematical programming frameworks, DOcplex and CVXPY.

How you can do it: You can follow the instructions in the project source code, which explains how to set up this Optimization by matrixing and choosing dimensions, deciding on parameters and decision variables, and finally running the Optimization and analyzing the results.

A Python OR-Tools Model for Seasonal Inventory Planning

A Python OR-Tools Model for Seasonal Inventory Planning

Manufacturing facilities may be unable to produce seasonal products within the vital season. Building additional inventory before the season can often be less expensive than buying other tooling and space. When taking this approach, there are a few questions to answer:

  • What is the quantity of the product to produce before the season starts?
  • Assuming the inventory will build up over time with a smoothed production plan, when does manufacturing need to start producing excess inventory to store for the high season?

In this project, we build a model that provides a scalable framework to build upon for multiple products amid competing capacity constraints. We will use OR-tools for Python to implement a multi-period inventory mode.

How you can do it: Follow this tutorial by Sabi Horvat, which explains the details of implementing this optimization modeling problem.

Forecasting Retail Sales Using Google Trends and Machine Learning

Forecasting Retail Sales Using Google Trends and Machine Learning visualization

The significant growth in an online e-commerce business, especially during the COVID-19 pandemic, has led to a structural change in the retail industry, presenting novel challenges and opportunities in demand forecasting to provide the right product, at the right place, at the right time, for the right price.

The primary objective of this experiment is to propose a methodological framework to incorporate external data, in particular from Google Trends, in retail sales forecasting by leveraging modern machine learning techniques.

In this project, we will use the Brazilian e-commerce by Olist as well as the Breakfast at the Frat by dunnhumby public datasets to conduct a quantitative experiment in which we compare the predictive performance of sales forecasts of the following models:

  • The Seasonal Autoregressive Integrated Moving Average (SARIMA) model
  • The Facebook Prophet tool (FBProphet)
  • The Extreme Gradient Boosting algorithm (XGBoost)
  • A recurrent neural network with long short-term memory (LSTM)

Various performance metrics are used to measure forecasting accuracy, and the performance of all forecasting models is benchmarked against a naïve model.

How you can do it: You can find the source code with a detailed explanation of the experiment structure on the Feras Al-Basha GitHub repository.

The experiment compares the results of using different input types through forecasting models and checks if it can come up with better predictions. It’s a great project to experiment with sales forecasting models and model performance evaluation.

Lean Six Sigma with Python — Kruskal Wallis Test

Lean Six Sigma (LSS) is a widely used method for improving supply chain management based on a stepwise approach to process improvements.

In this project, we will explore how in the analysis step, Python can be used to test hypotheses and understand what could improve the performance metrics of a specific process. We will use the Kruskal-Wallis test to confirm if a particular training positively impacts the operators’ productivity.

The process is broken down into:

  1. Scenario: Setting up the constraints and the assumptions of the problem
  2. EDA: exploring the data and coming up with the hypothesis to test
  3. Analysis of Variance (ANOVA): Calculating the p-value and validating the assumptions of ANOVA
  4. Kruskal-Wallis test and conclusion: Using Kruskal-Wallis Test to check if the difference in means is due to random fluctuation and then coming up with final insights and experiment conclusion.

How you can do it: You can follow this tutorial by Samir Saci detailing the process steps and provisioning the project’s source code. You can also check this video to learn more about Kruskal-Wallis Test. If you want to experiment with applying other parts of LSS, try using the Chi-squared test or simple logistic regression.

Learn More with Interview Query

You can also check out these helpful resources from Interview Query: