innovatism/The Machine Learning Team.md Secret

## The Machine Learning Team.md

      
    Raw
  

              The Machine Learning Team.md
            
          
    The Machine Learning (ML) Team

A machine learning project start with a lot of questions: what is the goal of the project? What application we are building? What results are we expecting? Which tasks we need to perform? What approach can we use?  To response to these questions, we need to build a robust team. Each member of the team will play an important role and cooperate closely with each other.
Roles definition

A fully matured machine learning team consists of the following core roles:

Data Analysts
Data Engineers
Data Scientist
Research Scientists
ML Engineers
Developers

There are also other supporting roles:

QA
TA
Annotators

Data Analysts


get insights from user data with:

descriptive statistics: give information that describes the data - some examples include Customer demographics, Landing page conversion rates, loyalty and retention rates)
inferential statistics : deduce the characteristics of users as a population, some examples:  user’s trend, statistical hypothesis testing (e.g: A/B testing)
Cooperate with the product-business team and create the product roadmap from these insights
Define model evaluation procedure and acceptance criteria


Analyse feedbacks data from deployed model
Tools: Excel, SQL, Tableau, Power BI …

Data Engineers


Build and maintain the infrastructure used to collect, transform and store data, an example is the ETL process (Extract Transform Load)
Develop annotator tools that helps collecting labeled data
Manage and orchestrate the pipeline of how data is ingested and moved across different means of data storage
As the number of data increases, they need to possess skills for distributed computing and storage (alias big data)
Tools: Data storage, Message brokers, Pipeline management tools, Data warehouse …

Data Scientist 


Analyse, process, interpreting data
Find features/ insights from data with statistical methods (feature engineering)
Communicate findings with business/ product team/ stakeholders
Build Machine Learning models that serve as prototypes or deployed in production
Tools: statistics, databases, Machine learning, machine learning frameworks …

Research Scientist

This role involves develop new algorithms for product-related fields. This leads to breakthroughs and competitive edges to competitors.
Machine Learning Engineer


Build and maintain tools and infrastructure to deploy, serve, monitor, and update model
Develop prediction interfaces (client side or cloud service endpoints)
Handle scalability with containers and orchestration platforms like Kubernetes

Developers


Integrate the machine learning product with the main application
Abstracting the machine learning prediction with user friendly features

Full Stack Data Scientist

Finding staffs for these specific roles is sometime challenging and not cost-effective in small company. This require one or a few data scientist to handle several roles at the same time, they are Full Stack Data Scientist.  They are thus required to have a wider range of knowledge and skills.
QA


Evaluate the model when it is deployed based on the acceptance criteria
Perform regression tests to ensure the model match the real use-cases

TA


Work closely with the ML Engineers to define automatic testing scenarios
Schedule and do performance tests with the model deployed on servers

Annotator:

They collect labeled data following data requirements with third-party tools or tools designed by the in-house data engineer. They can be:

Qualified in-house annotators
Contract annotators
Outsourced annotation services

Data quality and quantity are varied depending on which group of annotators. In-house annotator’s data quality is best but can be not enough for the considered application. Outsourced label data can come at good amount but will need quality control.