Data Science

How to Ace Home Assignments for Machine Learning Job Interviews

In this article, I aim to provide a clear picture of what is expected from a job applicant when asked to do a machine learning (ML)…

Stan Kriventsov

Oct 12, 2020

7 min read

While working as a senior ML engineer, I had to review dozens of such assignments, so I believe I have a good idea of what types of mistakes applicants usually make. I would like to help you avoid these mistakes when you are applying for your next job.

Essentially, you need to strive to accomplish the following 4 objectives:

Create an ML model that works and is reasonable for the task
Clearly present and analyze your results
Show your intelligence and attention to detail
Write clean, easy-to-understand, reproducible, mostly error-free code

For whatever reason, a lot of entry-level applicants seem to only really concentrate on the first one of these goals. Let us go over each of them in more detail and see what they involve.

👁 Image by author

Image by author

1. Create an ML model that works and is reasonable for the task

In general, this part is really task-dependent, so you need to know your stuff.

Some general insights:

Make sure you understand the assignment. If you are not certain what you are required to do, it is usually okay to ask for clarification.
Do some data cleaning and feature engineering as necessary. Do not just throw your data into a model and hope it gives you a good result.
Always follow the best machine learning practices. Among the most basic ones, make sure to divide your data into training/validation/test sets as appropriate. Don’t use your test set for anything except final evaluation of the performance of your model and ensure that there is no data leakage from it into the training set. Do not overfit your models, use the validation set to know when to stop training.
When in doubt about which ML technique to use, always choose an approach that you are reasonably familiar with in order to avoid embarrassing mistakes. Do not try to impress the interviewer with the newest technology or framework unless you are sure you understand it. A well-implemented linear regression model (assuming it is at all suited to the task) is better than a completely mishandled super-duper convolutional recurrent neural network or whatever.
You will usually not have a lot of time for these assignments, so tend to choose simpler solutions unless you are confident that you have enough time to implement the more complex ones. In any case, you can start with a very basic model and then use it as a baseline for further improvements. Sometimes you will discover that the simple approach is just as good and is more computationally efficient. Or, if you don’t manage to finish your complex model, at least you will have something to present.
If the task has practical significance, think about how your model would be used and whether its computational and data availability requirements make sense. Doing this will show that you are aware of real-life constraints and are capable of dealing with them.
Unless forbidden by the rules of the assignment, try to run an Internet search to see how other people have solved similar problems. However, do not just copy and paste their code without analyzing it, not even so much because the interviewers want to see your own coding skills (after all, a lot of boilerplate code is routinely reused), but, more importantly, because ML code tends to really depend on the fine details of the task and data, and what works for one problem will likely require some adjustments for another.

2. Clearly present and analyze your results

Believe it or not, but I’ve had multiple instances where the job applicant would just send me their code without any explanations. What did they expect me to do, run it myself and analyze the output for them?

Presentation and analysis of results are core skills for any ML engineer or data scientist, so make sure you invest sufficient time to show these off.

In particular:

State what you were trying to accomplish.
Do at least some exploratory analysis of your data and report any relevant findings. This will also help you create better models.
Explain why you chose the model that you used.
Clearly state which metrics you used and why, and what the performance of your model was based on these metrics.
Evaluate the quality of your model based on its performance. At the very least, pick a simple baseline solution (say, always predicting the most popular class in a classification problem, or the most recent value when doing time-series predictions) and make sure that your model can beat this approach.
Make sure to look at some actual predictions of your model and verify that they are reasonable. Sometimes people submit models that don’t work and claim good performance because their metric calculation function has an error as well 😄 . Don’t be that person.
If you can, go ahead and create some cool visualizations, but make sure this doesn’t come at the cost of not having enough time to actually build a good model.
Be prepared to discuss your work in detail.

3. Show your intelligence and attention to detail

Being smart and attentive are among the most important qualities for a data scientist. Every non-trivial problem requires thinking. No one wants to hire a person who has simply learned a bunch of black-box models and throws the data into them hoping that something good comes out.

Some of the ways to show these qualities are (most of these are also just good machine learning):

Make sure you pay attention to every detail in the description of your assignment. If you end up doing something different from what you were asked to do, it will not be a good look. Again, if in doubt, ask.
When performing exploratory analysis, REALLY examine your data. If it’s images, look at a bunch of them. If it’s text, read some of it. If it’s tabular data, create plots to see what it looks like and whether there are any anomalies. Clearly understanding the structure of your data will result in better feature engineering and will make you look smart.
When dealing with missing or corrupted data, make sure you use your brain. Do not replace missing data with infeasible values (more than a few applicants in our assignments filled in heart rate data with zeros 😄 ).
Spend enough time discussing your results. How does changing your model or its hyperparameters affect the performance? What are the compute requirements to run it? What pitfalls are possible in production? And so on.

4. Write clean, easy-to-understand, reproducible, mostly error-free code

Since you won’t have much time, you are usually not expected to deliver production-grade code (unless specified in the assignment). However, make sure that your code is easy to read and can be run on the interviewer’s machine if at all possible.

Run the exact code that you are submitting on your own computer and make sure that it generates the results that you claim. Quite often people submit a version of the code that doesn’t work.
List the details of the environment that you used (e.g., Python 3.6.8, TensorFlow 2.3.0, etc) so that the interviewer can run your code.
If any of your algorithms (for example, neural networks or decision trees) involve random initialization, make sure to set a specific random seed for reproducibility.
Putting your code into a Jupyter notebook is typically a good choice (unless advised otherwise in the assignment) as it allows you to show the output of each cell alongside the code.
Make sure that your code is well organized and each part is easy to understand. Write some comments when necessary. Do not go overboard with it, you don’t have to comment on every line (like, if your line is "x += 1", do not write a comment "# Add 1 to x", it just looks silly).
Remove any functions or code blocks that you end up not using at all in your final solution unless you want to discuss them in your report.

I hope that the discussion above will help you do well on your next ML home assignment! Good luck!

Written By

Stan Kriventsov

See all from Stan Kriventsov

Coding Interviews, Data Science, Job Hunting, Job Interview, Machine Learning

Share This Article

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

URL: https://towardsdatascience.com/how-to-ace-home-assignments-for-machine-learning-job-interviews-ac510830baa7/