izam-mohammed/problems_in_ml.md

## problems_in_ml.md

      
    Raw
  

              problems_in_ml.md
            
          
    types of problems in machine learning


general

how to collect data
What type of ML problem is this
what is the type of data
what will be the output
missing values and outliears
labeled or unlabeled data
what kind of preprocess and transformation needed
potential bieases in the model
what is the size of training and tesing data
possibiliy of data leakage
overfit and underfit chances
How to evaluate the model
what is the simplest way to make a model
how is to be diployed in the real world
how to monitor and update model in production
What ensemble methods can i consider
How to explain the model predictions
model serving and scalability
how to update the model with new data


regression

if linear relation
if polynomial relation
if some multicoliniarity
if future selection required
for count data
if a lot of non linear relations
outliers
missing values
if feauture importance is crutial
if homoscedasticity is violated
if the feautures are high dimention
if real time prediction required
if regularization needed
If the data distribution is skewed
Categorical data included
interpretebility is important
if training time is constraint
Data has temporal dependencies
Interacion effects


classification

Missing values
imbalance
if feature importance is crutial
categorical features
if interpretebility is important
binary
if we want simple and interpreteble model
non-linear decision boundaries
If it is a credit risk analysis
data is high dimentional
local petterns matter more than global
large number of features and independece assumption
large dataset and complex relation
multiclass
multilabel
anomaly detection
outliears
if data is textual
if overfitting is a concern
if training time is a constraint
if data has hierarchical structure
if data distribution is unknown and changing
if real time prediction required


nlp

EDA

token and text length distribution
vocabulary size
most frequent words
rare words
stop words and punctuations
N-grams analysis
Spelling and typos
Text annotation quality
entity co-occurance
error analysis
data augmentation oppurtunities


If Dimentionality reduction needed
If dealing with multilingual text
If handling noisy or informal text
text segmentation
chat bot
NER recognition
spam detection
POS tagging
transalation
Text summerisation
Speech recognition
Question answering
Sentiment analysis

In the case of training
using pretrained models


document similarity
Text generation
Coreference resolution
Dependancy parsing
Semantic role labeling
if real time processing required


computer vision

if dealing with small datasets
if real time processing required
If there are many small objects in image
If needing high accuracy and precision
If handling noisy or varied environments
if real time object tracking is required
Data augmentation oppurtunities
Image classification
Object detection
Image segmentation
Instance segmentation
Image generation or style transfer
Image captioning
Image super-resolution
Medical image analysis
extract info from docs
Object tracking
Anomaly detection in image


recommendataion system

If contains a lot of missing values
Addressing cold start problem
handling large scale data
If real time recommendation is important
If dealing with multicriteria recommendations
If incorporating social network information
colaborative filtering content based filtering
content based filtering
Matrix factorisation


time series

Anomalies and outliers
If data shows Non-stationary
Handling missing values
Dealing with structural breaks
If forecasting future values
time series classification
seasonal adjustments
If work with long time series