Skip to content

Instantly share code, notes, and snippets.

@lorenzodisidoro
Last active November 21, 2020 19:23
Show Gist options
  • Save lorenzodisidoro/105aeb54123e97226379e5036a78b506 to your computer and use it in GitHub Desktop.
Save lorenzodisidoro/105aeb54123e97226379e5036a78b506 to your computer and use it in GitHub Desktop.
The script calculates covariance matrix, eigenvectors and eigenvalues
#!/usr/bin/env python3
import numpy as np
import pandas as pd
## Import Wine dataset
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)
# data.iloc[<row selection>, <column selection>] the features are extracted and the labels are ignored (column 0)
X = df_wine.iloc[:, 1:].values
print("Number of dataset feautures: ", len(X.T)) # should be 13
## Covariance matrix, eigenvectors and eigenvalues
C = np.cov(X.T)
eigenvalues, eigenvectors = np.linalg.eig(C)
## Dataset Transformation
eigenvalues2eigenvectors = [(eigenvalues[i], eigenvectors[:,i]) for i in range(len(eigenvalues))]
eigenvalues2eigenvectors.sort(reverse=True)
W = np.hstack( (eigenvalues2eigenvectors[0][1][:, np.newaxis], eigenvalues2eigenvectors[1][1][:, np.newaxis]))
X_PCA = X.dot(W)
print("Number of dataset feautures: ", len(X_PCA.T)) # should be 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment