Skip to content

Instantly share code, notes, and snippets.

@Zhenye-Na
Created June 2, 2018 08:01
Show Gist options
  • Save Zhenye-Na/d4eed08bb31088f11086030e3e585fd9 to your computer and use it in GitHub Desktop.
Save Zhenye-Na/d4eed08bb31088f11086030e3e585fd9 to your computer and use it in GitHub Desktop.
Data_Scientist_Job_Description_Sample (From Kaggle)

Data Sciencetist Job Description Sample

for 1-year experience

  • Experience in machine learning techniques and frameworks
  • Experience with standard open source data science tools and languages (e.g. R, Python, etc)
  • Experience with database languages (e.g. SQL)
  • Experience with big data technologies (e.g. MapReduce, Hadoop, H20, and Spark)
  • Experience with text mining of unstructured data is preferred
  • Experience with advanced data visualization techniques is preferred
  • Demonstrated track record in building predictive models on complex datasets
  • Demonstrated self-direction and ability to learn new techniques
  • Ability to obtain work authorization in the United States

for 2-year experience

No.1

  • Minimum of a Bachelor's degree or equivalent in Computer Science, or in related field.
  • Collaborate with business users and the reporting team to ensure that requirements are correctly translated, documented, and developed to appropriate reporting design specifications.
  • Demonstrated ability to quickly learn new technologies and processes.
  • Must be able to work on multiple, concurrent projects
  • Healthcare industry experience, a plus.
  • General programming experience.
  • Business Intelligence, data analytics and dashboarding development experience, a must.
  • Good knowledge of the MS/SQL Server and environment, a plus.
  • Knowledge of R/RStudio/RShiny statistical programming.
  • Work on all stages of data science projects from understanding data to implementing the solutions into products and service offerings
  • Have an opportunity to work on diverse data sets for healthcare from electronic health records, demographic data sets to medical billing.
  • An interest in UI development and data visualization, a plus

NO.2

  • Proficient in R, SQL and Python
  • At least 2 years professional work experience as a Data Scientist
  • Solid foundation in statistics (e.g., multivariate analysis, A/B testing)
  • Advanced degree (Masters or PhD) in applied science, mathematics or engineering
  • Very strong written and verbal communication skills
  • Positive attitude, self-starter and excited by challenges
  • Experience with machine learning and working with healthcare data
  • Experience working with underserved patient populations in a healthcare or community-based setting or personal experience within a community negatively impacted by health disparities
  • Ability to work from our San Francisco office, or relocate to our San Francisco office for the first three months

NO.3

  • Background in AI, ML or statistics. Solid knowledge of theoretical concepts of common machine learning methods and algorithms
  • Proven hands-on experience building and deploying machine learning systems
  • High level in Python programming language
  • Ability to analyze and get insights from structured and unstructured datasets
  • Solid understanding of model tuning and evaluation techniques
  • Familiarity with data reduction and imputation techniques
  • Knowledge of your machine learning tools (e.g. scikit-learn, pandas, numpy)
  • Knowledge of deep learning approaches and experience with the frameworks: TensorFlow, Keras, PyTorch,...
  • Ideally, proficient in several data science areas (time series, computer vision, nlp, recommender systems,...)
  • At least an intermediate level in English, both written and spoken, live in (or willing to relocate to) Madrid and enjoy working in an international team
  • Experience in creation of high-quality dataset and benchmarks
  • Familiarity with version control (preferably Git)
  • Experience in Django
  • Experience with SQL databases

for 3-year experience

NO.1

  • 5+ years of experience in database mining/marketing with very large, complex and multi-dimensional datasets.
  • 2+ years of direct experience working in ad-tech with a proven track record of media mix optimization
  • Thorough understanding and ability to apply supervised and unsupervised learning models.
  • Must be able to synthesize disparate data to mine and communicate insights
  • Excellent collaboration and communication skills with a proven track record of working across all levels of the organization.
  • Experience with large individual customer behavioral data sets (gaming, retail, financial services) and expertise in communication channel best practices, including multi-channel marketing.
  • Technical mastery in advanced database mining tools such as SQL and SAS/R/Python (any of the three) with the ability to coach and train others on best practice.
  • Preferred experience working with Data Management Platforms, such as Adobe Audience Manager
  • Gaming experience particularly with the Call of Duty franchise a strong plus
  • Exceptional data analysis in Excel including ability to write macros.

NO.2

  • We are looking for someone with 2-5 years of experience in manipulating data sets and building social score models, and has a Degree in Statistics, Mathematics, Computer Science or other quantitative field.
  • Strong problem solving skills with an emphasis on product development.
  • Experience using statistical computer languages (R, Python, SLQ, etc.) to manipulate data and draw insights from large data sets.
  • Experience querying databases and using statistical computer languages: R, Python, SLQ, etc.
  • Experience working with and creating data architectures.
  • Knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural networks, etc.) and their real-world advantages/drawbacks.
  • Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications.
  • Excellent written and verbal communication skills for coordinating across teams.
  • A drive to learn and master new technologies and techniques.
  • Knowledge and experience in statistical and data mining techniques: GLM/Regression, Random Forest, Boosting, Trees, text mining, social network analysis, etc.
  • Experience creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modeling, clustering, decision trees, neural networks, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment