Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save maxsei/dc34a1fdc421db7023403ab1ed02be48 to your computer and use it in GitHub Desktop.
Save maxsei/dc34a1fdc421db7023403ab1ed02be48 to your computer and use it in GitHub Desktop.
fix for dsc-accessing-data-with-pandas-lab-houston-ds-042219
": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Accessing Data within Pandas - Lab"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"In this lab, we'll look at a data set which contains information World cup matches. Let's use the pandas commands learned in the previous lecture to learn more about our data!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Objectives\n",
"You will be able to:\n",
"* Understand and explain some key Pandas methods\n",
"* Access DataFrame data by using the label\n",
"* Perform boolean indexing on both Series and DataFrames\n",
"* Use simple selectors for series\n",
"* Set new Series and DataFrame inputs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load the data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the file `WorldCupMatches.csv` as a dataframe in Pandas"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Common methods and attributes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the correct method to look at the first 7 rows of the data set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look at the last 3 rows of the data set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get a concise summary of your data using `.info()`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Obtain a tuple representing the number of rows and number of columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the appropriate attribute to get the column names"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting dataframe information"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When looking at the dataframe's `.head()`, you might have noticed that the games are structured chronologically in the dataframe.\n",
"\n",
"Use the right selection method to print all the information from the 3rd to the 5th game."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, print all the info from game 5-9, but we're only interested to print out the \"Home Team Name\" and the \"Away Team Name\", "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we'd like the information on all the games played in Group 3 for the 1950 World Cup."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can combine conditions like this:\n",
"\n",
"`df[(condition1) | (condition2)]` -> Returns rows where either condition is true\n",
"\n",
"`df[(condition1) & (condition2)]` -> Returns rows where both conditions are true"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's repeat the command above, but now we only want to print out the attendance column for the Group 3 games"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Throughout the entire history of the world cup, How many Home games were played by the Netherlands?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many games were played by the Netherlands in total?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, let's try and figure out how many games the USA played in the 2014 world cup. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's try to find out how many countries participated in the 1986 world cup."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hint 1: as a first step, create a new data set that only contain games in that year.\n",
"\n",
"Hint 2: You can use `.unique()` to make sure you don't end up with duplicate country names."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"scrolled": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the world cup history, how matches had more than 5 goals in total?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Changing values and creating new columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the information you currently have in your `df`, create a new column \"Half-time Goals\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the code below. You'll notice that for Korea, there are records for both North-Korea (Korea DPR) and South-Korea (Korea Republic). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df.loc[df[\"Home Team Name\"].str.contains('Korea'), \"Home Team Name\" ]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Imagine that for some reason, we simply want Korea listed as one entry, so we want to replace every \"Home Team Name\" and \"Away Team Name\" entry that contains \"Korea\" to simply \"Korea\". In the same way, we want to change the columns \"Home Team Initials\" and \"Away Team Initials\" to NSK (North & South Korea) instead of \"KOR\" and \"PRK\". "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure to verify your answer!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"In this lab, you learned how to access data within Pandas!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment