Skip to content

Instantly share code, notes, and snippets.

@curran
Last active December 9, 2024 18:53
Show Gist options
  • Save curran/a08a1080b88344b0c8a7 to your computer and use it in GitHub Desktop.
Save curran/a08a1080b88344b0c8a7 to your computer and use it in GitHub Desktop.
The Iris Dataset

This is the "Iris" dataset. Originally published at UCI Machine Learning Repository: Iris Data Set, this small dataset from 1936 is often used for testing out machine learning algorithms and visualizations (for example, Scatter Plot). Each row of the table represents an iris flower, including its species and dimensions of its botanical parts, sepal and petal, in centimeters.

The HTML page provides the basic code required to load the data and display it on the page (as JSON) using D3.js.

For a more up to date code example with React & D3, see (VizHub: Stylized Scatter Plot)[https://vizhub.com/curran/3d631093c2334030a6b27fa979bb4a0d?edit=files&file=index.js].

<!DOCTYPE html>
<head>
<meta charset="utf-8">
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.9/d3.min.js"></script>
</head>
<body>
<script>
function render(data){
d3.select("body")
.append("pre")
.text(JSON.stringify(data, null, 2));
}
function type(d){
d.sepal_length = +d.sepal_length;
d.sepal_width = +d.sepal_width;
d.petal_length = +d.petal_length;
d.petal_width = +d.petal_width;
return d;
}
d3.csv("iris.csv", type, render);
</script>
</body>
sepal_length sepal_width petal_length petal_width species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.1 1.5 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.0 3.5 1.6 0.6 setosa
5.1 3.8 1.9 0.4 setosa
4.8 3.0 1.4 0.3 setosa
5.1 3.8 1.6 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.1 2.8 4.7 1.2 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.7 3.1 4.7 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
5.6 3.0 4.1 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
5.7 2.8 4.1 1.3 versicolor
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
4.9 2.5 4.5 1.7 virginica
7.3 2.9 6.3 1.8 virginica
6.7 2.5 5.8 1.8 virginica
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
6.4 2.7 5.3 1.9 virginica
6.8 3.0 5.5 2.1 virginica
5.7 2.5 5.0 2.0 virginica
5.8 2.8 5.1 2.4 virginica
6.4 3.2 5.3 2.3 virginica
6.5 3.0 5.5 1.8 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.0 2.2 5.0 1.5 virginica
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica
{
"columns": [
{ "name": "sepal_length", "type": "number", "label": "Sepal Length" },
{ "name": "sepal_width", "type": "number", "label": "Sepal Width" },
{ "name": "petal_length", "type": "number", "label": "Petal Length" },
{ "name": "petal_width", "type": "number", "label": "Petal Width" },
{ "name": "species", "type": "string", "label": "Species" }
]
}
@aplomb8716
Copy link

I read the file for the iris data set from the URL at the UCI Machine Learning Website. The python code is below:

Import package

from urllib.request import urlretrieve

Import pandas

import pandas as pd

Assign url of file: url

iris = 'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

Save file locally

urlretrieve(iris)

Read file into a DataFrame and print its head

df = pd.read_csv(iris, sep=',')

Add column names to the data frame

attributes = ["sepal_length", "sepal_width", "petal_length", "petal_width", "class"]
df.columns = attributes

View the first five lines of data frame

print(df.head())

@dejoseph
Copy link

Select the data set and Copy it. You can then paste it in Excel.

@dingaroo
Copy link

Click on the Raw button, in the frame for the data, and Save as a CSV file.

Launch Excel and open the csv file you just saved.

Hope this helps

@GTalert
Copy link

GTalert commented Jan 22, 2018

Anybody understand how does the Iris data set help in testing Clusters?

@aturner-ca
Copy link

This dataset (Fisher iris data) is included in the free trial offered by Penny Analytics, who run an online outlier detection service. You will need to download their version of the dataset to be sure to get the free pricing. The page is here:
https://pennyanalytics.com/free-trial/

@curran
Copy link
Author

curran commented Sep 7, 2019

Congrats @aturner-ca for getting free advertising! Nice spamming skills.

@aturner-ca
Copy link

aturner-ca commented Sep 7, 2019 via email

@curran
Copy link
Author

curran commented Sep 8, 2019

The comment seems to me to be about Penny Analytics, and not really about the dataset.

I understand startups are hard.

Good luck!

@aturner-ca
Copy link

aturner-ca commented Sep 8, 2019 via email

@PedroPovedaQ
Copy link

I read the file for the iris data set from the URL at the UCI Machine Learning Website. The python code is below:

Import package

from urllib.request import urlretrieve

Import pandas

import pandas as pd

Assign url of file: url

iris = 'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

Save file locally

urlretrieve(iris)

Read file into a DataFrame and print its head

df = pd.read_csv(iris, sep=',')

Add column names to the data frame

attributes = ["sepal_length", "sepal_width", "petal_length", "petal_width", "class"]
df.columns = attributes

View the first five lines of data frame

print(df.head())

I'd amend this by adding the column names inline, because the above approach removes the first row (making it 149 instead of 150)

df = pd.read_csv('iris.csv', sep=',', names=["sepal_length", "sepal_width", "petal_length", "petal_width", "class"])

@jannathossain
Copy link

jannathossain commented Sep 29, 2020

Sir how can I get iris_setosa, iris_versicolor variables from the iris dataset. If I want to see the mean of petal length of iris setosa.
what would be the python code?

@curran
Copy link
Author

curran commented Sep 29, 2020

I don't know Python.

@Charan7890
Copy link

one row of data is missing in it.How to get that too.

@curran
Copy link
Author

curran commented Jun 1, 2021

Really? Which row is missing?

@Charan7890
Copy link

Charan7890 commented Jun 1, 2021 via email

@curran
Copy link
Author

curran commented Jun 1, 2021

Looks like there are 151 rows including the header row, so 150 records, which is in line with the upstream dataset.
image

@kengz
Copy link

kengz commented May 21, 2022

in case anyone is looking for even simpler way to do it:

import pandas as pd

# download iris data and read it into a dataframe
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
df = pd.read_csv(url, names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])

@curran
Copy link
Author

curran commented Jan 18, 2023

Another example, loading this dataset in JavaScript: https://vizhub.com/curran/c9bedc6a76cf468997e3fcb5471db0ab?file=viz.js

@imvickykumar999
Copy link

@Akshaykk12
Copy link

You'll could have just downloaded Zip file at the top corner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment