Skip to content

Instantly share code, notes, and snippets.

@Ben-Epstein
Created November 5, 2020 15:26
Show Gist options
  • Save Ben-Epstein/0fb2ed1b9c643e59d25bfdec0f8eba3d to your computer and use it in GitHub Desktop.
Save Ben-Epstein/0fb2ed1b9c643e59d25bfdec0f8eba3d to your computer and use it in GitHub Desktop.
Load Iris Data into Spark
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
data = load_iris()
cols = [i.replace('(cm)','').strip().replace(' ','_') for i in data.feature_names] + ['label'] # Column name cleanup
pdf = pd.DataFrame(np.c_[data.data, data.target], columns=cols)
df = spark.createDataFrame(pdf)
df.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment