Instantly share code, notes, and snippets.

What would you like to do?

frequently asked question:

Q: I would like to ask your advice about preparing for a role in data science


my advice would be to put together a portfolio of projects, on GitHub, evidencing that you know how to

  • get data (e.g., via wget/curl)

  • scrub data (wisely choose and reproducibly remove "outliers")

  • model using a variety of approaches (supervised, unsupervised, exploratory) in python or possibly R (usually an employer will prefer one or the other, with more and more employers in my experience preferring python; in the Data Science Group at NYT it's helpful to know your way around SQL and scikit-learn. We don't do much in R, and nothing in SAS, SPSS, MATLAB, Mathematica, or... )

  • write a coherent description of what you learned, and what this implies for the stakeholder/collaborator/world;

    as well as

    how you chose the approach you took, what assumptions you made on the way what are the weaknesses in your approach, and what are the next steps.

    Update 1: Also consider getting your hands on some fun data to play with. Definition of "fun" is highly personal, so I list several sets which might be of interest:

    Update 2: Also consider taking a class ( cf., )

    Update 3: Also consider enrolling in a "data science boot camp", e.g.,

For more info:

My thoughts:



This comment has been minimized.

tommiechen commented Oct 7, 2014

great advice. thanks


This comment has been minimized.

zgmartin commented Dec 5, 2014

This was great advice. I would like to add to it.

-scrape data from web (scrapy)
-store data in database (Mongodb or SQL)
-extract data (pandas: split, merge, transform)
-model data (machine learning)
-document results (info, plots, error rates)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment