Skip to content

Instantly share code, notes, and snippets.

What would you like to do?

frequently asked question:

Q: I would like to ask your advice about preparing for a role in data science


my advice would be to put together a portfolio of projects, on GitHub, evidencing that you know how to

  • get data (e.g., via wget/curl)

  • scrub data (wisely choose and reproducibly remove "outliers")

  • model using a variety of approaches (supervised, unsupervised, exploratory) in python or possibly R (usually an employer will prefer one or the other, with more and more employers in my experience preferring python; in the Data Science Group at NYT it's helpful to know your way around SQL and scikit-learn. We don't do much in R, and nothing in SAS, SPSS, MATLAB, Mathematica, or... )

  • write a coherent description of what you learned, and what this implies for the stakeholder/collaborator/world;

    as well as

    how you chose the approach you took, what assumptions you made on the way what are the weaknesses in your approach, and what are the next steps.

    Update 1: Also consider getting your hands on some fun data to play with. Definition of "fun" is highly personal, so I list several sets which might be of interest:

    Update 2: Also consider taking a class ( cf., )

    Update 3: Also consider enrolling in a "data science boot camp", e.g.,

For more info:

My thoughts:



This comment has been minimized.

Copy link

commented Oct 7, 2014

great advice. thanks


This comment has been minimized.

Copy link

commented Dec 5, 2014

This was great advice. I would like to add to it.

-scrape data from web (scrapy)
-store data in database (Mongodb or SQL)
-extract data (pandas: split, merge, transform)
-model data (machine learning)
-document results (info, plots, error rates)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.