Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

frequently asked question:

Q: I would like to ask your advice about preparing for a role in data science

A:

my advice would be to put together a portfolio of projects, on GitHub, evidencing that you know how to

  • get data (e.g., via wget/curl)

  • scrub data (wisely choose and reproducibly remove "outliers")

  • model using a variety of approaches (supervised, unsupervised, exploratory) in python or possibly R (usually an employer will prefer one or the other, with more and more employers in my experience preferring python; in the Data Science Group at NYT it's helpful to know your way around SQL and scikit-learn. We don't do much in R, and nothing in SAS, SPSS, MATLAB, Mathematica, or... )

  • write a coherent description of what you learned, and what this implies for the stakeholder/collaborator/world;

    as well as

    how you chose the approach you took, what assumptions you made on the way what are the weaknesses in your approach, and what are the next steps.

    Update 1: Also consider getting your hands on some fun data to play with. Definition of "fun" is highly personal, so I list several sets which might be of interest: https://gist.github.com/chrishwiggins/84a6319246a7b8f547c4

    Update 2: Also consider taking a class ( cf., http://datascience.columbia.edu/data-science-academics )

    Update 3: Also consider enrolling in a "data science boot camp", e.g., http://insightdatascience.com/

For more info:

My thoughts: http://www.columbia.edu/itc/applied/wiggins/DSatW-wiggins.pdf

Hammerbacher: https://goo.gl/cVB4hn

@tommiechen

This comment has been minimized.

Copy link

tommiechen commented Oct 7, 2014

great advice. thanks

@zgmartin

This comment has been minimized.

Copy link

zgmartin commented Dec 5, 2014

This was great advice. I would like to add to it.

-scrape data from web (scrapy)
-store data in database (Mongodb or SQL)
-extract data (pandas: split, merge, transform)
-model data (machine learning)
-document results (info, plots, error rates)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.