chrishwiggins/FAQ-preparing-for-a-role-in-DS.md

## FAQ-preparing-for-a-role-in-DS.md

      
    Raw
  

              FAQ-preparing-for-a-role-in-DS.md
            
          
    frequently asked question:
Q: I would like to ask your advice about preparing for a role in
data science
A:
my advice would be to put together a portfolio of projects, on GitHub,
evidencing that you know how to


get data (e.g., via wget/curl)


scrub data (wisely choose and reproducibly remove "outliers")


model using a variety of approaches
(supervised, unsupervised, exploratory)
in python or possibly R
(usually an employer will prefer one or the other,
with more and more employers in my experience preferring python;
in the Data Science Group at NYT it's helpful to know your way
around SQL and scikit-learn. We don't do much in R, and nothing
in SAS, SPSS, MATLAB, Mathematica, or... )


write a coherent description of
what you learned, and
what this implies for the stakeholder/collaborator/world;
as well as
how you chose the approach you took,
what assumptions you made on the way
what are the weaknesses in your approach,
and
what are the next steps.
Update 1: Also consider getting your hands on some fun data to play with.
Definition of "fun" is highly personal, so I list several sets which
might be of interest: https://gist.github.com/chrishwiggins/84a6319246a7b8f547c4
Update 2: Also consider taking a class ( cf., http://datascience.columbia.edu/data-science-academics )
Update 3: Also consider enrolling in a "data science boot camp", e.g., http://insightdatascience.com/


For more info:
My thoughts:
http://www.columbia.edu/itc/applied/wiggins/DSatW-wiggins.pdf
Hammerbacher:
https://goo.gl/cVB4hn