Six Steps to Pass the Data Science Take Home Challenge

Six Steps to Pass the Data Science Take Home Challenge

Overview

How many times have you seen these exact words? Hopefully not much in your data science interview process. But if you’re one of many data scientists looking for a job, you might find yourself working on a data science take-home assignment in a zipped file with a requirements pdf that’s ten pages long.

The recruiter promises that there’s an intricate grading process on your assignment that shouldn’t take more than a couple of hours. But suddenly it’s 2:30 AM, three days later, 15 hours of coding exhaustion put in, and you haven’t even thought about trying that GAM model to see it improve your model’s F1 score by three percent. Why is this sort of thing continuing to happen? Why are companies wasting the candidate’s time still without any sort of feedback on the take-home?

The truth is that the process works as a filter for many companies that don’t have a standardized interview process. Development of technical interview questions requires a history of knowledge on the part of data science teams that just does not exist yet compared to software engineers.

So if you must, have to, and there’s no other choice, to do a take-home assignment. Here are a few steps to take to ensure a smoother process.

1. Understand Expectations

It’s difficult to push against a company who is interviewing you or going to be interviewing you. But understanding the full expectations of the data science take-home challenge will be the key to passing it successfully.

Here’s an email template to use with the recruiter.

Hi Recruiter’s Name,

Thanks for sending over the take-home assignment. I’m excited to start it and will be sure to send it back in X days with my completed solution.

Additionally, I was wondering if I could be provided with a set of general guidelines on how the assignment will be graded. I definitely want to be sure I’m focusing and demonstrating the correct skillset for the take-home and not accidentally going down a rabbit hole.

Lastly, I would really appreciate it if after I send in my take-home assignment that I could get some feedback on it, regardless of whether or not I move on in the interview process. It would really mean a lot to understand what I did wrong or where I excelled for my own technical growth.

Thanks!

2. State Assumptions Everywhere

Try to immediately tally up a list of questions that you can send to the recruiter/hiring manager after receiving the take-home challenge. Even after getting answers to your questions or receiving no answer, make sure to then state your assumptions in your data science take-home challenge. What do I mean by that?

What if you decide to only use a naive imputation model to fill in missing values instead of an advanced technique? State it. Write it in a comment. Do something where they understand your limitations to the amount of time you’re spending on the assignment.

Write up everything that you think needs to be known to your grader. Hiring managers forgot how long it took to write code and build models. They’re managers. They don’t write code.

3. Do the Modeling Basics

coding

Here’s a general checklist that will probably take you at least a minimum of three hours.

  • Data cleaning
  • Minimal feature selection
  • Impute missing values
  • Create a classification pipeline
  • Try training with a couple of sci-kit learn classifiers
  • Tune hyperparameters with grid-search

Boom. Now your implementation will reach the general minimal baseline of what they’re expecting. Dependent on how long you work on feature selection, it could go plus or minus an extra two to three hours.

4. Make the Take-Home Challenge Readable

Here’s a great guide toward code organization and readability for data scientists. It’s about structuring your project in an easy-to-digestible manner. I stumbled upon this randomly, but it completely makes sense. The Cookiecutter data science framework allows for a standardized process for data science projects. Taken directly from their website:

  • Collaborate more easily with you on this analysis
  • Learn from your analysis about the process and the domain
  • Feel confident in the conclusions at which the analysis arrives

I will note that it will definitely take you more than a few hours to organize your project with the complete format. But then again you already understood the cost when you decided to do a data science take-home assignment.

5. Write Tests and Comments

Did I mention documenting everything in your head onto paper? That includes writing comments and testing your code if it’s applicable. Readability is as important as the efficiency of your code and if you write nice comment blocks on each function, it will help communicate how your code should function and why you re-factored it the way you did. Follow the general Python conventions to make sure you’re solid.

6. Summarize Your Thought Process

Remember in high school English when all papers consisted of an introduction, content, and then conclusion, which repeated the introduction? Do that but in under 500 words. At the end of the day, the most likely scenario is that the person looking at your take-home assignment will spend a grand total of five minutes of their time understanding it before moving on back to browsing Reddit. You want to make it as easy as possible for them to understand your data science take-home challenge as being the best possible take-home challenge ever.