29 Feb 2016 - This is a post on my blog
October 2015 marked my first year of working as a data scientist on a remote team at Zoomer (we help restaurants deliver food, fast). I have always thought that remote work was an obvious choice given the penetration of the internet into all aspects of our lives. In this post I'm going to talk about some of the experiences I've had and lessons learned working on a remote data science team.
Zoomer's main offices are in Philadelphia and San Francisco, but, by the nature of our business, we are a fundamentally distributed company. While I live in San Francisco and work out of the SF office a few days a week, I spend most of the time working from my apartment, collaborating with my team remotely.
Remote data science is not fundamentally different than working on other remote teams, but it has its own set of challenges and difficulties. Done right, remote data science can work very well.
For a remote team to work, I feel that the following are important conditions:
If a company has only a small fraction of remote employees, they can often be neglected and miss out on important developments. For this reason, it is important that the default communication media in the company are remote worker-centric.
Use tools that are fundamentally designed for a distributed team, but feel like "obvious" choices, even for office-bound workers.
We use many of the usual suspects you find at tech startups: Slack, Github, Trello, Google Drive, etc. These all feel like best-in-class tools and not things that were chosen just because they have some remote capability. (Thank you cloud!)
Because of the highly technical nature of data science, video chats or phone calls are very important tools to quickly remove confusion or misunderstandings. That said, strong documentation is very important for future reference on why and how things are being done (e.g. models, data schema, assumptions about data). One instance of non-traditional "documentation" that has proven useful is archived chats. In fact, I have revisited one very long Slack discussion between myself and another Zoomer data scientist multiple times over a few months, because the discussion laid out all of the main issues and technical arguments for a project we were working on. Most importantly, a culture of explicitly communicating goals and questions is important in making sure that everyone is on the same page. Part of this is using the "no question is stupid" principle, as important things can get lost in the fog of data scienceing.
In a tech startup things tend to change very quickly. Projects, goals, responsibilities, and roles can change on a weekly basis, if not faster. This can be more of an issue in a remote company if these changes are not broadcast in a way that the entire company can easily digest. Often changes are communicated within the team that they most directly affect, but not beyond. An emphasis should be placed on communicating changes that might affect other teams. (Don't get me started on DB changes. 😎) An up-to-date company directory is invaluable when trying to figure out who you should bug about X.
It's easy to lose sight of the overall progress and direction of the company as a whole and its constituent parts when you are not all in the same place. Because data science can become more and more valuable to a company as the data, analysis, and modeling infrastructure matures, it's important that the data science teams maintains an strong understanding of the goals and needs of teams across the company.
This is an issue faced by all data science teams and can be a particular difficulty if your team's product is not highly visible. We have made an effort to evangelize the work and capabilities of our team to other parts of the company. This helps other teams determine if they have a need that data science is best equipped to address.
Jupyter notebooks are awesome. They hit the sweet spot for exploratory analysis, prototyping, and communication of analyses. Unfortunately, collaboration via notebooks seems to be an unsolved issue. They are poorly suited for version control with git. In fact, we decided to stop keeping them in Github except as gists. There are a number of features on the Jupyter roadmap that look like they will address some of these issues.
Remote data science teams have similar benefits and face similar challenges as other remote teams. Overall, the right environment and tools can lead to a very successful team. Communication is key and fostering a strong culture of communication is one of the most important aspects of remote data science success.
The benefits of remote data science are such, that any "special" efforts needed to make it work are clearly worth it in my opinion.
Here's to another year!