Skip to content

Instantly share code, notes, and snippets.

@sudoevans
Created December 15, 2023 08:39
Show Gist options
  • Save sudoevans/9d927efcec7def824ab4a0dbfbdc0e95 to your computer and use it in GitHub Desktop.
Save sudoevans/9d927efcec7def824ab4a0dbfbdc0e95 to your computer and use it in GitHub Desktop.
Machine Learning Personal Notes

This is my summary of machine learning fundamental concepts.

  1. Supervised Learning:
  • Goal: To train a model to predict output based on labeled training data.
  • Algorithms:
    • Linear Regression: Used for predicting continuous outcomes.
    • Logistic Regression: Used for binary classification problems.
    • Decision Trees: Simplifies data into rules to make predictions.
    • Support Vector Machines (SVMs): Finds the best decision boundary to separate data.
    • k-Nearest Neighbors (k-NN): Predicts based on similarity to nearby data points.
  1. Unsupervised Learning:
  • Goal: To find structure or patterns in unlabeled data.
  • Algorithms:
    • Clustering: Groups similar data points into clusters.
    • Principal Component Analysis (PCA): Reduces data dimensionality while preserving key features.
    • Anomaly Detection: Identifies unusual data points that deviate significantly from the norm.
  1. Reinforcement Learning:
  • Goal: To train an agent to make decisions in an environment to maximize rewards.
  • Algorithms:
    • Q-Learning: Updates value estimations based on past decisions and rewards.
    • SARSA (State-Action-Reward-State-Action): Similar to Q-Learning but uses only one action per state.
    • Deep Q-Networks (DQN): Uses neural networks for value estimation in complex environments.
  1. Deep Learning:
  • Goal: To create neural networks that learn from large amounts of data.
  • Architectures:
    • Convolutional Neural Networks (CNNs): Effective for image and speech recognition.
    • Recurrent Neural Networks (RNNs): Used for sequential data like text and time series.
    • Transformers: Recent advances for language processing and machine translation.
  1. Evaluation Metrics:
  • Accuracy: Measures the percentage of correct predictions.
  • Precision: Measures the proportion of true positives among all predicted positives.
  • Recall: Measures the proportion of true positives among all actual positives.
  • F1-score: Combines precision and recall into a single metric.
  • Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values.
  1. Bias and Variance:
  • Bias: The systematic error introduced by a model due to assumptions or simplifications.
  • Variance: The random error introduced by a model due to the randomness in the data.
  • Bias-Variance Tradeoff: Balancing bias and variance to optimize model performance.
  1. Overfitting and Underfitting:
  • Overfitting: When a model performs well on training data but poorly on unseen data.
  • Underfitting: When a model fails to capture the underlying patterns in the data.
  1. Regularization:
  • Techniques to reduce overfitting by penalizing model complexity.
  • L1 Regularization (Lasso): Penalizes the sum of absolute coefficients.
  • L2 Regularization (Ridge): Penalizes the sum of squared coefficients.
  1. Feature Engineering:
  • The process of transforming raw data into features that are more suitable for machine learning models.
  • Techniques:
    • Feature Scaling: Normalizing features to have a consistent range.
    • Feature Selection: Selecting the most informative features.
    • Feature Extraction: Creating new features from original features.
  1. Model Selection and Validation:
  • Techniques to select the best model and avoid overfitting:
    • Cross-Validation: Evaluates a model on multiple subsets of the data.
    • Train-Validation-Test Split: Divides the data into separate sets for training, validation, and testing.
    • Hyperparameter Tuning: Optimizing the model's hyperparameters to improve performance.
@sudoevans
Copy link
Author

https://www.v7labs.com/blog/supervised-vs-unsupervised-learning

Supervised Learning

  • Overview: In supervised learning, a model is trained on a dataset consisting of input-output pairs (also known as labeled data or training data). The model learns to map inputs to outputs based on the training data.
  • Advantages:
  1. Supervised learning algorithms are relatively easy to implement and understand.
  2. They can be used to solve a wide variety of problems, including classification, regression, and prediction.
  3. They are often very accurate when the training data is representative of the real world.
  • Disadvantages:
  1. Supervised learning algorithms require labeled data, which can be expensive and time-consuming to collect.
  2. They are not as flexible as unsupervised learning algorithms and may not be able to adapt to new data that is significantly different from the training data.
  3. They can be biased if the training data is not representative of the real world.
  • When to use supervised learning:
  1. Supervised learning is often a good choice for problems where you have a lot of labeled data and you need a model that is accurate and reliable.
  2. Supervised learning is also a good choice for problems where you can easily collect more labeled data if needed.
  3. Supervised learning is not a good choice for problems where you do not have any labeled data or where the labeled data is not representative of the real world.

Unsupervised Learning

  • Overview: In unsupervised learning, a model is trained on a dataset that does not have any labels. The model learns to find patterns and structure in the data without being explicitly told what to look for.
  • Advantages:
  1. Unsupervised learning algorithms do not require labeled data, which can be expensive and time-consuming to collect.
  2. They are more flexible than supervised learning algorithms and can adapt to new data that is significantly different from the training data.
  3. They can be used to find patterns and structure in data that is not obvious to humans.
  • Disadvantages:
  1. Unsupervised learning algorithms are more difficult to implement and understand than supervised learning algorithms.
  2. They are not as accurate as supervised learning algorithms when the training data is representative of the real world.
  3. They can be difficult to evaluate, as there is no clear way to measure how well they are performing.
  • When to use unsupervised learning:
  1. Unsupervised learning is often a good choice for problems where you do not have any labeled data or where the labeled data is not representative of the real world.
    2.Unsupervised learning is also a good choice for problems where you are interested in finding patterns and structure in data that is not obvious to humans.
  2. Unsupervised learning is not a good choice for problems where you need a model that is accurate and reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment