Skip to content

Instantly share code, notes, and snippets.



Last active Jul 18, 2018
What would you like to do?
Scala for Machine Learning

Notes for 'Scala for Machine Learning, P. Nicolas" url

  • Chapter 1

Critical to understand the different classes of ML algorithms and to select the ones that are relevant to the domain.

ML problems are categorized as classification, prediction, optimization, and regression.

  • Classification is to extract knowledge for historical data. For example, a classifier can be built to identify a disease from a set of symptoms.

  • One model is trained using history observations and validated against them, it can be used to predict some outcome.

  • Some global optimization problems are difficult to solve using traditional linear/non-linear optimization methods. ML techniques can improve the chances that the optimization method converges. Example: Fighting the spread of a new virus requires optimizing a process that may evolve over time as more symptoms or cases are uncovered.

  • Regression is classification technique that is suitable for a continuous model. It is special case of classification for which the output variables are continous instead of categorical.

Observations are defined are sets of vectors or features.

A model can be predictive, descriptive, or adaptive

  • Predictive models discover historical patterns in data and extract fundamental trends and relationship between features. These models are created through supervised learning using a preselected training set.

  • Descriptive models attempt to find unusual matters by grouping observations into clusters with similar properties. These models are created through unsupervised learning.

  • Adaptive models is created through reinforcement learning. Reinforcement learning consists of one or several decision making agents that recommend and possibly execute actions in the attempt of solving a problem, optimizing an objective function, or resolving constraints.

The purpose of ML is to teach computers to execute tasks without human intervention.

Unsupervised learnings does not required labeled data (or expected values).

  • Chapter 2: Hello World!

An observation is a set of indirect measurements of variables which may be noisy or have high degress of correlation or redundencies. Using all features in each observation incurs high computation cost.

Purpose of feature extraction is to reduce the number of variables or dimensions of the model by eliminating redundant or irrelevant features.

  • Chapter 4: Unsupervised Learning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.