Massimo Albani Mashimo

## Random Forest
A single decision tree, tasked to learn a dataset might not be able to perform well due to the outliers, and the breadth and depth complexity of the data. So instead of relying on a single tree, random forests rely on a forest of cleverly grown decision trees. Each tree within the forest is allowed to become highly specialized in a specific area, but still retains some general knowledge about most areas. When a random forest classifier, it is actually each tree in the forest working together to cast votes on what label they think a specific sample should be assigned.
 Instead of sharing the entire dataset with each decision tree, the forest performs an operation which is essential a train / test split of the training data. Each decision tree in the forest randomly samples from the overall training data set. Through doing so, each tree exist in an independent subspace and the variation between trees is controlled. This technique is known as tree bagging, or bootstrap aggregating.
  In addition to the tree bagg

## Decision Tree
Decision trees are a supervised, probabilistic, machine learning classifier that are often used as decision support tools. Like any other classifier, they are capable of predicting the label of a sample, and the way they do this is by examining the probabilistic outcomes of your samples' features.
Decision trees are one of the oldest and most used machine learning algorithms, perhaps even pre-dating machine learning. They're very popular and have been around for decades. Following through with sequential cause-and-effect decisions comes very naturally.
Decision trees are a good tool to use when you want backing evidence to support a decision.

## Support Vector Machines
Support vector machines are a set of supervised learning algorithms that you can use for classification, regression and outlier detection purposes. SciKit-Learn has many classes for SVM usage, depending on your purpose. The one we'll be focusing on is Support Vector Classifier, SVC.

## Regression
Examples of regression models for prediction

## Classification K-nearest neighbours
Clustering groups samples that are similar within the same cluster.
Supervised: data samples have labels associated.
Use the K-nearest algorithm.

## Clustering unsupervised
Clustering groups samples that are similar within the same cluster.
Unsupervised: no label provided in the data samples.
Use the K-means algorithm.

## Isomap
Isomap is a nonlinear dimensionality reduction method.
The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate
of each data point’s neighbours

## PCA
Principal Component Analysis

## data visualisation
Check the data-visualisation-README file below.

## readNHL.py
import pandas as pd

# Load up the table for the years 2014-2015, and extract the dataset out of it.
#
url = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2"
table_df = pd.read_html(url, header=1)[0]

# Columns get automatic names. Rename the columns so that they are similar to the
# column definitions on the website.
#
	A single decision tree, tasked to learn a dataset might not be able to perform well due to the outliers, and the breadth and depth complexity of the data. So instead of relying on a single tree, random forests rely on a forest of cleverly grown decision trees. Each tree within the forest is allowed to become highly specialized in a specific area, but still retains some general knowledge about most areas. When a random forest classifier, it is actually each tree in the forest working together to cast votes on what label they think a specific sample should be assigned.
	Instead of sharing the entire dataset with each decision tree, the forest performs an operation which is essential a train / test split of the training data. Each decision tree in the forest randomly samples from the overall training data set. Through doing so, each tree exist in an independent subspace and the variation between trees is controlled. This technique is known as tree bagging, or bootstrap aggregating.
	In addition to the tree bagg
	Decision trees are a supervised, probabilistic, machine learning classifier that are often used as decision support tools. Like any other classifier, they are capable of predicting the label of a sample, and the way they do this is by examining the probabilistic outcomes of your samples' features.
	Decision trees are one of the oldest and most used machine learning algorithms, perhaps even pre-dating machine learning. They're very popular and have been around for decades. Following through with sequential cause-and-effect decisions comes very naturally.
	Decision trees are a good tool to use when you want backing evidence to support a decision.
	Clustering groups samples that are similar within the same cluster.
	Supervised: data samples have labels associated.
	Use the K-nearest algorithm.
	Clustering groups samples that are similar within the same cluster.
	Unsupervised: no label provided in the data samples.
	Use the K-means algorithm.
	Isomap is a nonlinear dimensionality reduction method.
	The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate
	of each data point’s neighbours
	import pandas as pd

	# Load up the table for the years 2014-2015, and extract the dataset out of it.
	#
	url = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2"
	table_df = pd.read_html(url, header=1)[0]

	# Columns get automatic names. Rename the columns so that they are similar to the
	# column definitions on the website.
	#