Skip to content

Instantly share code, notes, and snippets.

@drewfustin
Created May 24, 2018 20:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drewfustin/581a51ada34f4584d34d44a3812d2f13 to your computer and use it in GitHub Desktop.
Save drewfustin/581a51ada34f4584d34d44a3812d2f13 to your computer and use it in GitHub Desktop.
Panel of Hiring Partners: How to Get Hired as a Data Scientist

My answers for Panel of Hiring Partners: How to Get Hired as a Data Scientist, an event put on by Metis tonight.

What are the top 3 qualities you look for when hiring a data scientist?

  • Can they figure things out? With the rare exception (e.g. when I'm looking to fill a particular exact skillset role with someone with marketing experience), I don't expect you to know a common series of steps to go from point A to point B in an analysis. This isn't what the job typically looks like in the environments I've been in. What I expect is that you can understand the need, be able to come up with a strategy that improves our ability to fill that need, understand how to check your solution for gotchas (by writing quality tests and by knowing which quality statistics to check against to make sure your solutions isn't diverging from reality), and implement a solution that allows you to iterate on its quality but also to move on to the next thing.
  • Are they teachable? I thrive on honest and humble communication where I can be frank about the things I'm good at and don't need help on and be transparent about the things I don't know so that I can learn. This cuts down on having to parse human communication and separate reality from the ego clouding the situation, but it also leads to an environment of quickly failing, efficiently deploying resources, and learning so that next time is even better. Saying "I don't know" is not only something I think is okay, it's something I actively encourage and revel in getting the opportunity to do myself. Acting like you know everything also creates a toxic environment around you, as it encourages a competitive culture and shuts down open communication.
  • Do they diversify the team? When expanding a team, it doesn't do much good to collect clones of those already on the team. If there are 3 skills in data science -- A, B, and C -- having someone who specializes in A and B and another person who specializes in B, it would make little sense not to go after someone who specializes more in C. The same can be said not just of skills, but background, previous experiences, culture, gender, and so forth. The hard skills are often easier to test for than the softer skills, but putting yourself in the best position for having a team capable of handling any challenge thrown at it definitely involves having those on the team that are not just like you and the others on the team. I don't believe that any effort saved in the team jibing easily by checking off some nebulous "culture fit" box outweighs the benefits of having diversity on the team.

What can be expected as part of the interview process?

First of all, I want to say that in the process, I'm a fan of giving the candidate the questions ahead of time so that they can prepare. You're hopefully not often in a situation at work where you go into a meeting or phone call with no context, so I don't think it's representative to see if a candidate is good or bad at thinking on their feet like that, since this isn't a skill they'll often use, in a healthy environment.

As for the process:

  • You apply and attach a resume. I try to have someone on the data science team review the resume, which is possible everywhere I've been so far, since the team and company is small. This is just because I've never hired somewhere with more specialized HR resources like Maria, and I want to make sure we don't have to start getting rigid rejection criteria in place for resumes like "highest degree earned." The best data scientist I know, until very recently, didn't have his bachelor's degree and would often get rejected outright in this stage, and I want to make sure we don't miss someone like him.
  • So many resumes look the same. So, I'd say less than half pass this filter process. More on that later.
  • The candidate is then scheduled to receive the dreaded take-home exam, which is always said to take 2-4 hours, but always takes longer for me, personally. My point with this is to get a standardized example of how someone thinks about a problem and what their code looks like. I'm often not interested in super good results, especially if those results are something boring like just sending it through an endless process of scikit-learn models and picking the best one. I want the candidate to show intuition about the data at hand and see how they tackle the problem not as a robot, but as a person with insights beyond optimizing an ML program. I'm also on the lookout for code quality -- is it documented well, are their useful comments, can I set up their environment easily to reproduce, is it tested, etc? Keep in mind, since I'm doing this stage earlier in the process in order to keep the net wider and not miss talent based on a bad phone screen, I'm definitely not spending that much time looking through this solution as a grader, so the results have to be well-summarized so I get the point easily and clearly.
  • Probably ~10% of take-home exams are quality enough to get a 30-minute phone screen scheduled, again usually with a person on the data science team. This screen is a way to get an idea about who the candidate is and how they communicate. I'll usually ask about some previous project, and see if I can get the idea of what they did and why it was valuable. In addition, I'll use this call to tailor the on-site interview a bit better for the candidate.
  • Most people will also get an on-site interview that get a phone screen. This will be a series of one-on-one or two-on-one interviews, that are some combination of the following: a key stakeholder this person might interact with (e.g. a marketing director or analyst), a product manager or someone else in between the business and the engineering, an engineer, someone on the data science team, and an executive in charge of the team. I'm trying to get an idea of how the person will fit into the org as a whole and how they can fill gaps in what we have. The longest part will be with the data science team, obviously. In this part, I'm definitely not a fan of classical white boarding or brain teasers, so I'll often come with a technical project the company needs to solve and see if the candidate and I can have an architecture meeting about how to tackle the problem and what technology to use in tackling the problem and how we'll go about solving it. I want this interview to be a shadow of what it's like to work at the company, so I'm not interested in if you know how to implement a bubble sort (Stack Overflow knows that), I'm interested in your thinking on why we'd need a sort in this particular part of the process.

What makes a data scientist/analyst candidate stand out from the rest?

As I said, resumes are often so similar, there are often only two things that ever stand out to me, and sadly, neither is about degrees or programs you've been in, specifically.

  • What is your previous experience, specifically, that will help you do this job? Never had a data science job before? That's often fine. Tell me about any previous experience you have that checks off the other boxes of showing you're a quick learner, you're teachable, and things that define your viewpoint on the world. Definitely highlight anything technical when you can, though.

  • What is an example of your work that I can look at? I can't stress this enough (even though I don't take my own advice): HAVE A PORTFOLIO! Write at least one blogpost where you show me your process of tackling a technical problem using coding and statistics. A typical outline would be:

I think this problem is interesting. Is there some insight I can gain? Where can I get the data to answer this question? What does the collection, cleaning, and pipelining process look like for getting this data into a good place? What is the code used to develop an answer to the interesting problem? Plot some things that clearly tell the story without the code, especially in a way that can be interpreted by a technical person with specialization in the field, but not necessarily someone who is a coder or data scientist. Give me an actionable step that can be taken with this knowledge. Are there follow-up ideas for how to improve this process? Are there ways for others to get involved and take what you've done to the next level?

What is one piece of advice you can give to help someone prepare for a data science interview?

Since everyone always gets similar advice, and I can think of many pieces of advice to give, I want to go with the least common piece of advice I hear that has been useful to me, especially before I actually worked in the tech industry: learn how the business works. When I was coming out of grad school, I was luckily married to a woman who is quite successful in the consulting world, so some of her knowledge rubbed off on me, but otherwise, I was clueless about how the business world worked. Also, luckily, I didn't know what I actually wanted to do. So, I was spending quite a lot of time studying for strategy consulting interviews because maybe I wanted to do that for a living (I didn't want to do that, but what did I know?). The "dreaded" part of a strat consulting interview is the case interview, where they give you a high level example of a business problem, and you talk with the interviewer trying to come up with a solution that could be implemented. An example being: "Harley Davidson is thinking about trying to enter the Chinese market. How should we help them prepare for this and how can we help them roll out a solution?" Completely open ended, but the interviewer comes with some data in hand, and you as the interviewee ask questions, eventually getting clues and pieces of advice, and eventually settle on an implementation to start rolling out. Super stressful, not something I recommend, but definitely studying for this process helped me understand things like, "What does marketing care about measuring? How can sales be optimized to roll out a solution? Should I consider legal hurdles? What do I need to measure in the process to make sure we're moving towards a equitable solution?" This may or may not be useful to you, but I found the exercises fun and helpful for me. The book I used was called Case in Point by Marc Cosentino.

What questions would you recommend job seekers to ask during the interview process?

More so that you realize, your happiness in the job will be defined much less by the kinds of work that you do and much more by who your manager is. Having a good manager is far-and-away the thing that will lead to success in your job and happiness in the workplace, so do not underestimate this. Your manager won't just be the person that assigns you work and grades you on it, they will be your advocate in the organization, your coach who encourages you to produce great work and stay on target and receive needed correction and help. That being the case, really try to spend as much time as you can with this specific person, get to know them and

  • how they think and
  • how they work and
  • what their expectations are for their employees and
  • what they value in their employees and their work and
  • how they praise and
  • how they correct and
  • what their expectations are for this role and for you and
  • what you will learn and how you will grow.

And, let me stress: If the manager has a vision for this role and the kinds of skills a person will develop working in this role, so that over some timeline, that person is equipped for these particular new challenges, that is the kind of manager you want. If your manager is thinking, even before hiring a person, how this role will enrich the employee and not just the employer, do whatever you can to work for that person.

An example, a friend works for Twilio as a developer evangelist. He goes around giving talks about Twilio, doing some coding and showing the power of the platform, and eventually selling stuff in to companies. He's an ambassador for the code. His boss thinks of this program as a training ground for CEOs and for helping to start your own company. His vision for these roles is not only to increase the sales channels for Twilio, but also to prepare his employees to develop skills that will help them be leaders in their own places in the future (presumable, places that are future customers of Twilio, of course).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment