Skip to content

Instantly share code, notes, and snippets.

@andrewmusselman
Last active March 7, 2016 19:23
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save andrewmusselman/91699ddbbb70fb7aabb1 to your computer and use it in GitHub Desktop.
Save andrewmusselman/91699ddbbb70fb7aabb1 to your computer and use it in GitHub Desktop.
Data Scientist and Engineer Job Descriptions

Job title – Data Scientist

Basic Qualifications

  1. Bachelor’s Degree in Mathematics, Technical Science, Computer Science (or equivalent) or Engineering
  2. Minimum 1 year programming experience in at least one item from each:
  • R, SAS,Mathematica, MATLAB, Sagemath
  • Python, Ruby, Perl, Java, Scala
  • Linux
  • Bash scripting including sed, awk, cut, uniq, sort, tr
  • SQL
  1. 6 months experience in Plotting Graphics (Scatterplots/matrix plots, Line graphs/bar charts, etc.)
  2. 1 year experience in Data Analysis

Preferred Qualifications

  1. BS or Advanced Degree in Mathematics, Statistics, Econometrics, or other data-heavy research science with experience in the following highly desired
  • Linear Algebra
  • Statistics
  • Graph theory
  • Network analysis
  • Algorithms
  • Probability
  • Markov chains/hidden Markov models
  • Classification and regression techniques
  • Matrix factorization/singular value decomposition
  1. Expertise with the following software
  • Hadoop
  • Spark/MLLib/GraphX
  • Mahout
  • Graphlab
  • Other machine learning libraries
  • Pig and UDFs
  • Hive and UDFs
  • Source control systems: Git, Mercurial, or Subversion
  • Build tools: ant, maven
  • Cloud storage and computation such as AWS's EC2 and EMR
  1. Multiple years’ experience in plotting / graphics
  • Advanced/specialized plotting techniques
  • Cross-platform skills: R, JavaScript, Mathematica, Python, etc.
  1. Previous experience with
  • Built recommenders or large-scale computation of metrics(similarity, cohort containment, etc.)
  • Designed and measured performance of predictive models
  • Automation of work
  • Web services

Job title – Data Engineer

Basic Qualifications

  1. Bachelor’s Degree in a technical field or equivalent four years of work experience
  2. Minimum 2 years programming experience in at least one item from each:
  • Python, Ruby, Perl, Java, Scala
  • Linux
  • Bash scripting including sed, awk, cut, uniq, sort, tr
  • SQL, HiveQL, Pig, Cascading
  1. Minimum 2 years working with at least one from each:
  • Git, Mercurial, or Subversion
  • Ant, Maven, sbt

Preferred Qualifications

  1. Expertise with the following software
  • Sqoop, Flume, Storm, Kafka
  • HBase, Cassandra, Mongo, Riak
  • Cloud storage and computation such as AWS EC2 and EMR or Heroku
  • Hadoop
  • Spark/MLLib/GraphX
  • Mahout
  • Graphlab
  • Other machine learning libraries
  • Pig and UDFs
  • Hive and UDFs
  1. Previous experience with
  • Continuous integration
  • Automation of work
  • Automated system configuration such as Puppet or Chef
  • Web services
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment