frequently asked question:
Q: I would like to ask your advice about preparing for a role in data science
A:
my advice would be to put together a portfolio of projects, on GitHub, evidencing that you know how to
-
get data (e.g., via wget/curl)
-
scrub data (wisely choose and reproducibly remove "outliers")
-
model using a variety of approaches (supervised, unsupervised, exploratory) in python or possibly R (usually an employer will prefer one or the other, with more and more employers in my experience preferring python; in the Data Science Group at NYT it's helpful to know your way around SQL and scikit-learn. We don't do much in R, and nothing in SAS, SPSS, MATLAB, Mathematica, or... )
-
write a coherent description of what you learned, and what this implies for the stakeholder/collaborator/world;
as well as
how you chose the approach you took, what assumptions you made on the way what are the weaknesses in your approach, and what are the next steps.
Update 1: Also consider getting your hands on some fun data to play with. Definition of "fun" is highly personal, so I list several sets which might be of interest: https://gist.github.com/chrishwiggins/84a6319246a7b8f547c4
Update 2: Also consider taking a class ( cf., http://datascience.columbia.edu/data-science-academics )
Update 3: Also consider enrolling in a "data science boot camp", e.g., http://insightdatascience.com/
For more info:
My thoughts: http://www.columbia.edu/itc/applied/wiggins/DSatW-wiggins.pdf
Hammerbacher: https://goo.gl/cVB4hn
great advice. thanks