Skip to content

Instantly share code, notes, and snippets.

@izam-mohammed
Created May 9, 2024 17:02
Show Gist options
  • Save izam-mohammed/d5490b2e238921931f1c69aaefea3a05 to your computer and use it in GitHub Desktop.
Save izam-mohammed/d5490b2e238921931f1c69aaefea3a05 to your computer and use it in GitHub Desktop.
this is the types of problems in Machine Learning

types of problems in machine learning

  • general

    • how to collect data
    • What type of ML problem is this
    • what is the type of data
    • what will be the output
    • missing values and outliears
    • labeled or unlabeled data
    • what kind of preprocess and transformation needed
    • potential bieases in the model
    • what is the size of training and tesing data
    • possibiliy of data leakage
    • overfit and underfit chances
    • How to evaluate the model
    • what is the simplest way to make a model
    • how is to be diployed in the real world
    • how to monitor and update model in production
    • What ensemble methods can i consider
    • How to explain the model predictions
    • model serving and scalability
    • how to update the model with new data
  • regression

    • if linear relation
    • if polynomial relation
    • if some multicoliniarity
    • if future selection required
    • for count data
    • if a lot of non linear relations
    • outliers
    • missing values
    • if feauture importance is crutial
    • if homoscedasticity is violated
    • if the feautures are high dimention
    • if real time prediction required
    • if regularization needed
    • If the data distribution is skewed
    • Categorical data included
    • interpretebility is important
    • if training time is constraint
    • Data has temporal dependencies
    • Interacion effects
  • classification

    • Missing values
    • imbalance
    • if feature importance is crutial
    • categorical features
    • if interpretebility is important
    • binary
    • if we want simple and interpreteble model
    • non-linear decision boundaries
    • If it is a credit risk analysis
    • data is high dimentional
    • local petterns matter more than global
    • large number of features and independece assumption
    • large dataset and complex relation
    • multiclass
    • multilabel
    • anomaly detection
    • outliears
    • if data is textual
    • if overfitting is a concern
    • if training time is a constraint
    • if data has hierarchical structure
    • if data distribution is unknown and changing
    • if real time prediction required
  • nlp

    • EDA
      • token and text length distribution
      • vocabulary size
      • most frequent words
      • rare words
      • stop words and punctuations
      • N-grams analysis
      • Spelling and typos
      • Text annotation quality
      • entity co-occurance
      • error analysis
      • data augmentation oppurtunities
    • If Dimentionality reduction needed
    • If dealing with multilingual text
    • If handling noisy or informal text
    • text segmentation
    • chat bot
    • NER recognition
    • spam detection
    • POS tagging
    • transalation
    • Text summerisation
    • Speech recognition
    • Question answering
    • Sentiment analysis
      • In the case of training
      • using pretrained models
    • document similarity
    • Text generation
    • Coreference resolution
    • Dependancy parsing
    • Semantic role labeling
    • if real time processing required
  • computer vision

    • if dealing with small datasets
    • if real time processing required
    • If there are many small objects in image
    • If needing high accuracy and precision
    • If handling noisy or varied environments
    • if real time object tracking is required
    • Data augmentation oppurtunities
    • Image classification
    • Object detection
    • Image segmentation
    • Instance segmentation
    • Image generation or style transfer
    • Image captioning
    • Image super-resolution
    • Medical image analysis
    • extract info from docs
    • Object tracking
    • Anomaly detection in image
  • recommendataion system

    • If contains a lot of missing values
    • Addressing cold start problem
    • handling large scale data
    • If real time recommendation is important
    • If dealing with multicriteria recommendations
    • If incorporating social network information
    • colaborative filtering content based filtering
    • content based filtering
    • Matrix factorisation
  • time series

    • Anomalies and outliers
    • If data shows Non-stationary
    • Handling missing values
    • Dealing with structural breaks
    • If forecasting future values
    • time series classification
    • seasonal adjustments
    • If work with long time series
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment