Skip to content

Instantly share code, notes, and snippets.

@wonksknowsuchin
Created June 19, 2022 23:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wonksknowsuchin/f70a701c9f17887482fa326f1980f4c3 to your computer and use it in GitHub Desktop.
Save wonksknowsuchin/f70a701c9f17887482fa326f1980f4c3 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "protected-dragon",
"metadata": {},
"outputs": [],
"source": [
"from pydataset import data"
]
},
{
"cell_type": "markdown",
"id": "focused-protest",
"metadata": {},
"source": [
"# Load all the datasets present in this library"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "worthy-amazon",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dataset_id</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>AirPassengers</td>\n",
" <td>Monthly Airline Passenger Numbers 1949-1960</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BJsales</td>\n",
" <td>Sales Data with Leading Indicator</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BOD</td>\n",
" <td>Biochemical Oxygen Demand</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Formaldehyde</td>\n",
" <td>Determination of Formaldehyde</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>HairEyeColor</td>\n",
" <td>Hair and Eye Color of Statistics Students</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>752</th>\n",
" <td>VerbAgg</td>\n",
" <td>Verbal Aggression item responses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>753</th>\n",
" <td>cake</td>\n",
" <td>Breakage Angle of Chocolate Cakes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>754</th>\n",
" <td>cbpp</td>\n",
" <td>Contagious bovine pleuropneumonia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>755</th>\n",
" <td>grouseticks</td>\n",
" <td>Data on red grouse ticks from Elston et al. 2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>756</th>\n",
" <td>sleepstudy</td>\n",
" <td>Reaction times in a sleep deprivation study</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>757 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" dataset_id title\n",
"0 AirPassengers Monthly Airline Passenger Numbers 1949-1960\n",
"1 BJsales Sales Data with Leading Indicator\n",
"2 BOD Biochemical Oxygen Demand\n",
"3 Formaldehyde Determination of Formaldehyde\n",
"4 HairEyeColor Hair and Eye Color of Statistics Students\n",
".. ... ...\n",
"752 VerbAgg Verbal Aggression item responses\n",
"753 cake Breakage Angle of Chocolate Cakes\n",
"754 cbpp Contagious bovine pleuropneumonia\n",
"755 grouseticks Data on red grouse ticks from Elston et al. 2001\n",
"756 sleepstudy Reaction times in a sleep deprivation study\n",
"\n",
"[757 rows x 2 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data()"
]
},
{
"cell_type": "markdown",
"id": "packed-flooring",
"metadata": {},
"source": [
"# Load the top 5 datasets from this library"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "moving-zimbabwe",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dataset_id</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>AirPassengers</td>\n",
" <td>Monthly Airline Passenger Numbers 1949-1960</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BJsales</td>\n",
" <td>Sales Data with Leading Indicator</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BOD</td>\n",
" <td>Biochemical Oxygen Demand</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Formaldehyde</td>\n",
" <td>Determination of Formaldehyde</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>HairEyeColor</td>\n",
" <td>Hair and Eye Color of Statistics Students</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dataset_id title\n",
"0 AirPassengers Monthly Airline Passenger Numbers 1949-1960\n",
"1 BJsales Sales Data with Leading Indicator\n",
"2 BOD Biochemical Oxygen Demand\n",
"3 Formaldehyde Determination of Formaldehyde\n",
"4 HairEyeColor Hair and Eye Color of Statistics Students"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d=data()\n",
"d.head()"
]
},
{
"cell_type": "markdown",
"id": "fewer-theorem",
"metadata": {},
"source": [
"# Load the top 15 datasets from this library"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "native-glenn",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dataset_id</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>AirPassengers</td>\n",
" <td>Monthly Airline Passenger Numbers 1949-1960</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>BJsales</td>\n",
" <td>Sales Data with Leading Indicator</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>BOD</td>\n",
" <td>Biochemical Oxygen Demand</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Formaldehyde</td>\n",
" <td>Determination of Formaldehyde</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>HairEyeColor</td>\n",
" <td>Hair and Eye Color of Statistics Students</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>InsectSprays</td>\n",
" <td>Effectiveness of Insect Sprays</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>JohnsonJohnson</td>\n",
" <td>Quarterly Earnings per Johnson &amp; Johnson Share</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>LakeHuron</td>\n",
" <td>Level of Lake Huron 1875-1972</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>LifeCycleSavings</td>\n",
" <td>Intercountry Life-Cycle Savings Data</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Nile</td>\n",
" <td>Flow of the River Nile</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>OrchardSprays</td>\n",
" <td>Potency of Orchard Sprays</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>PlantGrowth</td>\n",
" <td>Results from an Experiment on Plant Growth</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Puromycin</td>\n",
" <td>Reaction Velocity of an Enzymatic Reaction</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Titanic</td>\n",
" <td>Survival of passengers on the Titanic</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>ToothGrowth</td>\n",
" <td>The Effect of Vitamin C on Tooth Growth in Guinea Pigs</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dataset_id title\n",
"0 AirPassengers Monthly Airline Passenger Numbers 1949-1960\n",
"1 BJsales Sales Data with Leading Indicator\n",
"2 BOD Biochemical Oxygen Demand\n",
"3 Formaldehyde Determination of Formaldehyde\n",
"4 HairEyeColor Hair and Eye Color of Statistics Students\n",
"5 InsectSprays Effectiveness of Insect Sprays\n",
"6 JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share\n",
"7 LakeHuron Level of Lake Huron 1875-1972\n",
"8 LifeCycleSavings Intercountry Life-Cycle Savings Data\n",
"9 Nile Flow of the River Nile\n",
"10 OrchardSprays Potency of Orchard Sprays\n",
"11 PlantGrowth Results from an Experiment on Plant Growth\n",
"12 Puromycin Reaction Velocity of an Enzymatic Reaction\n",
"13 Titanic Survival of passengers on the Titanic\n",
"14 ToothGrowth The Effect of Vitamin C on Tooth Growth in Guinea Pigs"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.head(15)"
]
},
{
"cell_type": "markdown",
"id": "authorized-toolbox",
"metadata": {},
"source": [
"# Load the last 5 datasets from this library"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "oriented-chemical",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dataset_id</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>752</th>\n",
" <td>VerbAgg</td>\n",
" <td>Verbal Aggression item responses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>753</th>\n",
" <td>cake</td>\n",
" <td>Breakage Angle of Chocolate Cakes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>754</th>\n",
" <td>cbpp</td>\n",
" <td>Contagious bovine pleuropneumonia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>755</th>\n",
" <td>grouseticks</td>\n",
" <td>Data on red grouse ticks from Elston et al. 2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>756</th>\n",
" <td>sleepstudy</td>\n",
" <td>Reaction times in a sleep deprivation study</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dataset_id title\n",
"752 VerbAgg Verbal Aggression item responses\n",
"753 cake Breakage Angle of Chocolate Cakes\n",
"754 cbpp Contagious bovine pleuropneumonia\n",
"755 grouseticks Data on red grouse ticks from Elston et al. 2001\n",
"756 sleepstudy Reaction times in a sleep deprivation study"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.tail() "
]
},
{
"cell_type": "markdown",
"id": "confident-official",
"metadata": {},
"source": [
"# Load the last 10 datasets from this library"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "unavailable-refund",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dataset_id</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>747</th>\n",
" <td>Dyestuff</td>\n",
" <td>Yield of dyestuff by batch</td>\n",
" </tr>\n",
" <tr>\n",
" <th>748</th>\n",
" <td>Dyestuff2</td>\n",
" <td>Yield of dyestuff by batch</td>\n",
" </tr>\n",
" <tr>\n",
" <th>749</th>\n",
" <td>InstEval</td>\n",
" <td>University Lecture/Instructor Evaluations by Students at ETH</td>\n",
" </tr>\n",
" <tr>\n",
" <th>750</th>\n",
" <td>Pastes</td>\n",
" <td>Paste strength by batch and cask</td>\n",
" </tr>\n",
" <tr>\n",
" <th>751</th>\n",
" <td>Penicillin</td>\n",
" <td>Variation in penicillin testing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>752</th>\n",
" <td>VerbAgg</td>\n",
" <td>Verbal Aggression item responses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>753</th>\n",
" <td>cake</td>\n",
" <td>Breakage Angle of Chocolate Cakes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>754</th>\n",
" <td>cbpp</td>\n",
" <td>Contagious bovine pleuropneumonia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>755</th>\n",
" <td>grouseticks</td>\n",
" <td>Data on red grouse ticks from Elston et al. 2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>756</th>\n",
" <td>sleepstudy</td>\n",
" <td>Reaction times in a sleep deprivation study</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dataset_id title\n",
"747 Dyestuff Yield of dyestuff by batch\n",
"748 Dyestuff2 Yield of dyestuff by batch\n",
"749 InstEval University Lecture/Instructor Evaluations by Students at ETH\n",
"750 Pastes Paste strength by batch and cask\n",
"751 Penicillin Variation in penicillin testing\n",
"752 VerbAgg Verbal Aggression item responses\n",
"753 cake Breakage Angle of Chocolate Cakes\n",
"754 cbpp Contagious bovine pleuropneumonia\n",
"755 grouseticks Data on red grouse ticks from Elston et al. 2001\n",
"756 sleepstudy Reaction times in a sleep deprivation study"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.tail(10)"
]
},
{
"cell_type": "markdown",
"id": "progressive-hunter",
"metadata": {},
"source": [
"# Display a specific dataset"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "american-transition",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>class</th>\n",
" <th>age</th>\n",
" <th>sex</th>\n",
" <th>survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1312</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1313</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1314</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1315</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1316</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1316 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" class age sex survived\n",
"1 1st class adults man yes\n",
"2 1st class adults man yes\n",
"3 1st class adults man yes\n",
"4 1st class adults man yes\n",
"5 1st class adults man yes\n",
"... ... ... ... ...\n",
"1312 3rd class child women no\n",
"1313 3rd class child women no\n",
"1314 3rd class child women no\n",
"1315 3rd class child women no\n",
"1316 3rd class child women no\n",
"\n",
"[1316 rows x 4 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df= data('titanic')\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "permanent-aruba",
"metadata": {},
"source": [
"# Display all the column "
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "missing-inspection",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['class', 'age', 'sex', 'survived'], dtype='object')"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns "
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "united-picnic",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>class</th>\n",
" <th>age</th>\n",
" <th>sex</th>\n",
" <th>survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1st class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1312</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1313</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1314</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1315</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1316</th>\n",
" <td>3rd class</td>\n",
" <td>child</td>\n",
" <td>women</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1316 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" class age sex survived\n",
"1 1st class adults man yes\n",
"2 1st class adults man yes\n",
"3 1st class adults man yes\n",
"4 1st class adults man yes\n",
"5 1st class adults man yes\n",
"... ... ... ... ...\n",
"1312 3rd class child women no\n",
"1313 3rd class child women no\n",
"1314 3rd class child women no\n",
"1315 3rd class child women no\n",
"1316 3rd class child women no\n",
"\n",
"[1316 rows x 4 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d=df[df.notnull()] \n",
"d"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "natural-shepherd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 1316 entries, 1 to 1316\n",
"Data columns (total 4 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 class 1316 non-null object\n",
" 1 age 1316 non-null object\n",
" 2 sex 1316 non-null object\n",
" 3 survived 1316 non-null object\n",
"dtypes: object(4)\n",
"memory usage: 51.4+ KB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "markdown",
"id": "formed-siemens",
"metadata": {},
"source": [
"# Display a specific column "
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "running-bowling",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 adults\n",
"2 adults\n",
"3 adults\n",
"4 adults\n",
"5 adults\n",
" ... \n",
"1312 child\n",
"1313 child\n",
"1314 child\n",
"1315 child\n",
"1316 child\n",
"Name: age, Length: 1316, dtype: object"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df ['age'] "
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "relative-greene",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 adults\n",
"2 adults\n",
"3 adults\n",
"4 adults\n",
"5 adults\n",
" ... \n",
"1312 child\n",
"1313 child\n",
"1314 child\n",
"1315 child\n",
"1316 child\n",
"Name: age, Length: 1316, dtype: object"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.age"
]
},
{
"cell_type": "markdown",
"id": "norwegian-consensus",
"metadata": {},
"source": [
"# Data types of all the columns "
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "extra-midwest",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"class object\n",
"age object\n",
"sex object\n",
"survived object\n",
"dtype: object"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"id": "brown-header",
"metadata": {},
"source": [
"# Data type of specific column from the dataset"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "honest-receptor",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('O')"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df ['class'].dtypes"
]
},
{
"cell_type": "markdown",
"id": "mathematical-institute",
"metadata": {},
"source": [
"# Display basic information about the dataset"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "contemporary-return",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 1316 entries, 1 to 1316\n",
"Data columns (total 4 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 class 1316 non-null object\n",
" 1 age 1316 non-null object\n",
" 2 sex 1316 non-null object\n",
" 3 survived 1316 non-null object\n",
"dtypes: object(4)\n",
"memory usage: 51.4+ KB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "markdown",
"id": "neither-arnold",
"metadata": {},
"source": [
"# Display the statistical information of the categorical columns"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "objective-analysis",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>class</th>\n",
" <th>age</th>\n",
" <th>sex</th>\n",
" <th>survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>1316</td>\n",
" <td>1316</td>\n",
" <td>1316</td>\n",
" <td>1316</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>3rd class</td>\n",
" <td>adults</td>\n",
" <td>man</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>706</td>\n",
" <td>1207</td>\n",
" <td>869</td>\n",
" <td>817</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" class age sex survived\n",
"count 1316 1316 1316 1316\n",
"unique 3 2 2 2\n",
"top 3rd class adults man no\n",
"freq 706 1207 869 817"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"id": "spatial-currency",
"metadata": {},
"source": [
"# Display the statistical information of the numerical columns"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "dominant-advertising",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Time</th>\n",
" <th>demand</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>6.000000</td>\n",
" <td>6.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>3.666667</td>\n",
" <td>14.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>2.160247</td>\n",
" <td>4.630623</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>8.300000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>2.250000</td>\n",
" <td>11.625000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>3.500000</td>\n",
" <td>15.800000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>4.750000</td>\n",
" <td>18.250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7.000000</td>\n",
" <td>19.800000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Time demand\n",
"count 6.000000 6.000000\n",
"mean 3.666667 14.833333\n",
"std 2.160247 4.630623\n",
"min 1.000000 8.300000\n",
"25% 2.250000 11.625000\n",
"50% 3.500000 15.800000\n",
"75% 4.750000 18.250000\n",
"max 7.000000 19.800000"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1=data(\"BOD\")\n",
"df1.describe()"
]
},
{
"cell_type": "markdown",
"id": "proved-criterion",
"metadata": {},
"source": [
"# Ploting a graph of the BOD dataset columns"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "congressional-spring",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df1.plot()"
]
},
{
"cell_type": "markdown",
"id": "incorporate-integral",
"metadata": {},
"source": [
"# Ploting a graph of the BOD dataset columns using matplotlib library"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "hindu-weekly",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x1e17d614190>]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"plt.plot(df1['Time'], df1['demand']) #using the BOD(Biological Oxygen Demand Dataset)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment