mccutchen/data_scientist_interview.md

## data_scientist_interview.md

      
    Raw
  

              data_scientist_interview.md
            
          
    Some engineering-related interview questions to ask a data scientist candidate.

How comfortable are you at a Linux or OS X command line?

Can you navigate around?
What are pipes?


Do you use source control?

How comfortable are you with git in particular?
Are you familiar with pull requests and code review?
Do you know to avoid common mistakes like checking large input or result data sets into the source code repo?


How do you ensure that your results are repeatable?  How do you make sure other people can run your code reliably and easily?
If you had to make a separate HTTP request to some upstream API for every ID in a list of 100,000 IDs, how would you approach that?

What if a request to a second upstream API was required?  How does this change if the second request depends on the results of the first?


How might you approach the problem of sharing a report you've made to people on a regular basis?