Last week I wrote a blog post on how to pass the dreaded data science take home challenge. You can read the full blog post on Medium, but I wanted to summarize my tips into something succinct for anyone that's looking for help on their data science take home challenges.
1. Understand expectations
It’s difficult to push against acompany who is interviewing you or going to be interviewing you. But understanding the full expectations of the data science take-home challenge will be the key to passing it successfully.
Here's an email template to use with the recruiter.
Hi Recruiter’s Name,
Thanks for sending over the take-home assignment. I’m excited to start it and will be sure to send it back in X days with my completed solution.
Additionally, I was wondering if I could be provided with a set of general guidelines on how the assignment will be graded. I definitely want to be sure I’m focusing and demonstrating the correct skillset for the take-home and not accidentally going down a rabbit hole.
Lastly, I would really appreciate it if after I send in my take-home assignment that I could get some feedback on it, regardless of whether or not I move on in the interview process. It would really mean a lot to understand what I did wrong or where I excelled for my own technical growth.
2. State assumptions everywhere
Try to immediately tally up a list of questions that you can send to the recruiter/hiring manager after receiving the take-home challenge. Even after getting answers to your questions or receiving no answer, make sure to then state your assumptions in your take-home challenge. What do I mean by that?
What if you decide to only use a naive imputation model to fill in missing values instead of an advanced technique? State it. Write it in a comment. Do something where they understand your limitations to the amount of time you’re spending on the assignment.
Write up everything that you think needs to be known to your grader. Hiring managers forgot how long it took to write code and build models. They’re managers. They don’t write code.
3. Do the modeling basics
Here’s a general checklist that will probably take you at least a minimum of three hours.
- Data cleaning
- Minimal feature selection
- Impute missing values
- Create a classification pipeline
- Try training with a couple of sci-kit learn classifiers
- Tune hyperparameters with grid-search
Boom. Now your implementation will reach the general minimal baseline of what they’re expecting. Dependent on how long you work on feature selection, it could go plus or minus an extra two to three hours.
4. Make the take-home challenge readable
Here’s a great guide towards code organization and readability for data scientists. It’s about structuring your project in an easy to digestible manner. I stumbled upon this randomly but it completely makes sense. The Cookiecutter data science framework allows for a standardized process for data science projects. Taken directly from their website:
- Collaborate more easily with you on this analysis
- Learn from your analysis about the process and the domain
- Feel confident in the conclusions at which the analysis arrives
I will note that it will definitely take you more than a few hours to organize your project with the complete format. But then again you already understood the cost when you decided to do a data science take-home assignment.
5. Write Tests and Comments
Did I mention documenting everything in your head onto paper? That includes writing comments and testing your code if it’s applicable. Readability is as important as the efficiency of your code and if you write nice comment blocks on each function, it will help communicate how your code should function and why you re-factored it the way you did. Follow the general Python conventions to make sure you’re solid.
6. Summarize your thought process in under 500 words!
Remember in high school English when all papers consisted of an introduction, content, and then conclusion, which repeated the introduction. Do that but in under 500 words. At the end of the day, the most likely scenario is that the person looking at your take-home assignment will spend a grand total of five minutes of their time understanding it before moving on back to browsing Reddit. You want to make it as easy as possible for them to understand your data science take-home challenge as being the best possible take-home challenge ever.