Skip to content

Instantly share code, notes, and snippets.

@numberwhun
Last active August 29, 2015 14:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save numberwhun/756cdb3dcf9dacea2cf2 to your computer and use it in GitHub Desktop.
Save numberwhun/756cdb3dcf9dacea2cf2 to your computer and use it in GitHub Desktop.
What should I do to be a Data Scientist?
Written By Armen K.
Borrowed from: http://www.datasciencebowl.com/lp/national-data-science-bowl/blog
I’m often asked “what are good things to do to get involved with data science?” In this post I’ll share with you some key activities you can get started with today.
Whether it is photos on phones, links in a social network, or information around health, our society has an increased appreciation and enthusiasm about data. It’s an exciting time as open data initiatives are met with open software tools that enable large-scale analysis. To actually get insight from data requires intellectual curiosity and a mix of skills and knowledge.
Learn more math:
Being able to formalize different relationships in data enables you to identify potentially useful features. Statistics plays an important role for testing whether candidate relationships are salient or spurious. Datasets are often high-dimensional and are increasingly large. The ability to think of subspaces and projections in data is an important aspect of data exploration. Finally, building a deep understanding of the advanced algorithms used by data scientists (e.g., machine learning, regression) require a solid foundation in the underlying mathematics.
There are a variety of ways to expand your knowledge of mathematics. Here are just a few:
University courses – These are a great way to build foundational knowledge in topics such as statistics and linear algebra. Check out your local university and community college for the courses that are available.
Online courses – Topics covered range from basic mathematics to advanced machine learning. Sites like Coursera [1], Udacity [2], and Stanford Online [3] offer a variety of courses in math, statistics, and machine learning. Sites like Project Euler [4] helps you to build problem-solving skills in addition to mathematical insights.
Self-study – Never discount the classic method of purchasing a textbook and working through problems. This can be a great way to bolster skills around a particular topic or concept. Plus, textbooks are cheap now!
Blog posts and tutorials – There are a variety of great blogs available with illustrative tutorials around complex mathematical topics. One of our favorites is LEARNING Lover [5].
Build things:
Data scientists build all sorts of things from analytic pipelines in order to set up predictive models to data products such as recommendation engines. Necessarily, computer programming is critical to processing and systematically analyzing data. We are often asked what programming language should I learn for data scientists. There is always lots of debate around this point, but you can’t go wrong with either R or Python. Scala has also become popular as data scientists are increasingly using Spark.
There are a huge number of online resources to help you get started. CodeCademy [6] and Code School [7] have free courses on Python and R respectively. Individual open source projects also have documentation and tutorials you can reference (e.g., [8], [9]). Explore Data Science [10] has a great sequence of modules interweaving key math concepts and Python programming.
Coding during hack-a-thons and meetup events is a great way to learn from others. It also fosters collaboration, a vital activity for data scientists. By getting out in the community, you’ll begin to grapple with challenges in uncharted problem spaces where innovative solutions are needed. NASA’s space apps challenge [11] is a great example. There are also organizations like Data Kind [12] who host hack-a-thons that connect data scientists to social good problems. Hack-a-thons can also lead to projects requiring a sustained coding effort, which can be invaluable in enhancing your technical acumen.
Be a data ninja:
The ability to slice and dice data enables data scientists to run down hypotheses. Data scientists generate hypotheses both through data-driven approaches as well as from leveraging domain knowledge. The key characteristic of a data ninja is agility, and it’s upon this ability that the effectiveness of data scientists hinges. In this way, you can search for insight in datasets in an interactive way, where the testing of one hypothesis may lead to another. This is where the real art of data science comes into play.
The best way to learn these skills is by doing. Actively seek challenging problems to solve. Data science competitions from sites like Kaggle [13] provide “prepackaged” problems and data sets. Simply exploring open source data sets from places like Data.gov can help build valuable skills. Don’t overlook the hack-a-thons and meetup groups previously mentioned in this post. There are data science meetup groups in nearly every major city now, providing great resources to help connect you with challenging problems. They have the added benefit of connecting you with the data science community.
This post doesn’t touch on everything you need to know to be a data scientist nor every method of learning. It should, however, help get you started on your journey to becoming a data scientist. As you explore these three key areas you’ll quickly identify knowledge gaps that you can explore further. Never stop learning! If you’re building out experience in math and coding across different domains, you’re well on your way.
Links:
www.coursera.org
www.udacity.com
online.stanford.edu
projecteuler.net
learningglover.com
www.codecademy.com/tracks/python
www.codeschool.com/courses/try-r
www.scala-lang.org/documentation/
www.python.org/doc/
exploredatascience.com
2014.spaceappschallenge.org/challenge/
www.datakind.org
www.kaggle.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment