Skip to content

Instantly share code, notes, and snippets.

@lundquist-ecology-lab
Last active January 25, 2023 02:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lundquist-ecology-lab/48660ed53969d8ddacd308bd71ed7f65 to your computer and use it in GitHub Desktop.
Save lundquist-ecology-lab/48660ed53969d8ddacd308bd71ed7f65 to your computer and use it in GitHub Desktop.
Principal component analysis (PCA) in Python
# Running a principal components analysis (PCA) in Python
#%%
import pandas as pd
# pip install scikit-learn
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns
# Import data
url = "https://raw.githubusercontent.com/lundquist-ecology-lab/biostatistics/main/example_data/iris.csv"
data = pd.read_csv(url)
# Scale data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data.iloc[:,[0,3]])
# Perform PCA
pca = PCA(n_components=2)
pca.fit(data_scaled)
# Project data onto first two principal components
data_pca = pca.transform(data_scaled)
# Plot PCA
sns.scatterplot(x=data_pca[:, 0], y=data_pca[:, 1], hue=data['Species'], palette='Set1')
plt.xlabel("First Principal Component")
plt.ylabel("Second Principal Component")
plt.show()
# %%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment