Job title – Data Scientist
Basic Qualifications
- Bachelor’s Degree in Mathematics, Technical Science, Computer Science (or equivalent) or Engineering
- Minimum 1 year programming experience in at least one item from each:
- R, SAS,Mathematica, MATLAB, Sagemath
- Python, Ruby, Perl, Java, Scala
- Linux
- Bash scripting including sed, awk, cut, uniq, sort, tr
- SQL
- 6 months experience in Plotting Graphics (Scatterplots/matrix plots, Line graphs/bar charts, etc.)
- 1 year experience in Data Analysis
Preferred Qualifications
- BS or Advanced Degree in Mathematics, Statistics, Econometrics, or other data-heavy research science with experience in the following highly desired
- Linear Algebra
- Statistics
- Graph theory
- Network analysis
- Algorithms
- Probability
- Markov chains/hidden Markov models
- Classification and regression techniques
- Matrix factorization/singular value decomposition
- Expertise with the following software
- Hadoop
- Spark/MLLib/GraphX
- Mahout
- Graphlab
- Other machine learning libraries
- Pig and UDFs
- Hive and UDFs
- Source control systems: Git, Mercurial, or Subversion
- Build tools: ant, maven
- Cloud storage and computation such as AWS's EC2 and EMR
- Multiple years’ experience in plotting / graphics
- Advanced/specialized plotting techniques
- Cross-platform skills: R, JavaScript, Mathematica, Python, etc.
- Previous experience with
- Built recommenders or large-scale computation of metrics(similarity, cohort containment, etc.)
- Designed and measured performance of predictive models
- Automation of work
- Web services
Job title – Data Engineer
Basic Qualifications
- Bachelor’s Degree in a technical field or equivalent four years of work experience
- Minimum 2 years programming experience in at least one item from each:
- Python, Ruby, Perl, Java, Scala
- Linux
- Bash scripting including sed, awk, cut, uniq, sort, tr
- SQL, HiveQL, Pig, Cascading
- Minimum 2 years working with at least one from each:
- Git, Mercurial, or Subversion
- Ant, Maven, sbt
Preferred Qualifications
- Expertise with the following software
- Sqoop, Flume, Storm, Kafka
- HBase, Cassandra, Mongo, Riak
- Cloud storage and computation such as AWS EC2 and EMR or Heroku
- Hadoop
- Spark/MLLib/GraphX
- Mahout
- Graphlab
- Other machine learning libraries
- Pig and UDFs
- Hive and UDFs
- Previous experience with
- Continuous integration
- Automation of work
- Automated system configuration such as Puppet or Chef
- Web services