Skip to content

Instantly share code, notes, and snippets.

@DuyLe22
Created December 24, 2020 18:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DuyLe22/7a29f8cf2abcee86fb626bbb4728fba9 to your computer and use it in GitHub Desktop.
Save DuyLe22/7a29f8cf2abcee86fb626bbb4728fba9 to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
" <img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/Logos/organization_logo/organization_logo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n",
"</center>\n",
"\n",
"# Reading Files Python\n",
"\n",
"Estimated time needed: **40** minutes\n",
"\n",
"## Objectives\n",
"\n",
"After completing this lab you will be able to:\n",
"\n",
"- Read text files using Python libraries\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Table of Contents</h2>\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
" <ul>\n",
" <li><a href=\"download\">Download Data</a></li>\n",
" <li><a href=\"read\">Reading Text Files</a></li>\n",
" <li><a href=\"better\">A Better Way to Open a File</a></li>\n",
" </ul>\n",
" \n",
"</div>\n",
"\n",
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"download\">Download Data</h2>\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('Example1.txt', <http.client.HTTPMessage at 0x7f791c108c18>)"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import urllib.request\n",
"url = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt'\n",
"filename = 'Example1.txt'\n",
"urllib.request.urlretrieve(url, filename)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2020-12-24 13:29:51-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 45 [text/plain]\n",
"Saving to: ‘/resources/data/Example1.txt’\n",
"\n",
"/resources/data/Exa 100%[===================>] 45 --.-KB/s in 0s \n",
"\n",
"2020-12-24 13:29:51 (18.0 MB/s) - ‘/resources/data/Example1.txt’ saved [45/45]\n",
"\n"
]
}
],
"source": [
"# Download Example file\n",
"\n",
"\n",
"!wget -O /resources/data/Example1.txt https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"read\">Reading Text Files</h2>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One way to read or write a file in Python is to use the built-in <code>open</code> function. The <code>open</code> function provides a <b>File object</b> that contains the methods and attributes you need in order to read, save, and manipulate the file. In this notebook, we will only cover <b>.txt</b> files. The first parameter you need is the file path and the file name. An example is shown as follow:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadOpen.png\" width=\"500\" />\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The mode argument is optional and the default value is <b>r</b>. In this notebook we only cover two modes: \n",
"\n",
"<ul>\n",
" <li><b>r</b> Read mode for reading files </li>\n",
" <li><b>w</b> Write mode for writing files</li>\n",
"</ul>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the next example, we will use the text file <b>Example1.txt</b>. The file is shown as follow:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadFile.png\" width=\"100\" />\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We read the file: \n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Read the Example1.txt\n",
"\n",
"example1 = \"Example1.txt\"\n",
"file1 = open(example1, \"r\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can view the attributes of the file.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The name of the file:\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Example1.txt'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the path of file\n",
"\n",
"file1.name"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The mode the file object is in:\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'r'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the mode of file, either 'r' or 'w'\n",
"\n",
"file1.mode"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can read the file and assign it to a variable :\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 1 \\nThis is line 2\\nThis is line 3'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Read the file\n",
"\n",
"FileContent = file1.read()\n",
"FileContent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The <b>/n</b> means that there is a new line. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can print the file: \n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# Print the file with '\\n' as a new line\n",
"\n",
"print(FileContent)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file is of type string:\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Type of file content\n",
"\n",
"type(FileContent)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is very important that the file is closed in the end. This frees up resources and ensures consistency across different python versions.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Close file after finish\n",
"\n",
"file1.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"better\">A Better Way to Open a File</h2>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the <code>with</code> statement is better practice, it automatically closes the file even if the code encounters an exception. The code will run everything in the indent block then close the file object. \n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# Open file using with\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" FileContent = file1.read()\n",
" print(FileContent)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file object is closed, you can verify it by running the following cell: \n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Verify if the file is closed\n",
"\n",
"file1.closed"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can see the info in the file:\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# See the content of file\n",
"\n",
"print(FileContent)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The syntax is a little confusing as the file object is after the <code>as</code> statement. We also don’t explicitly close the file. Therefore we summarize the steps in a figure:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadWith.png\" width=\"500\" />\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We don’t have to read the entire file, for example, we can read the first 4 characters by entering three as a parameter to the method **.read()**:\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1\n"
]
}
],
"source": [
"# Read first four characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(14))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the method <code>.read(4)</code> is called the first 4 characters are called. If we call the method again, the next 4 characters are called. The output for the following cell will demonstrate the process for different inputs to the method <code>read()</code>:\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This\n",
" is \n",
"line 1 \n",
"\n",
"This is line 2\n"
]
}
],
"source": [
"# Read certain amount of characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(4))\n",
" print(file1.read(4))\n",
" print(file1.read(7))\n",
" print(file1.read(15))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The process is illustrated in the below figure, and each color represents the part of the file read after the method <code>read()</code> is called:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadChar.png\" width=\"500\" />\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Here is an example using the same file, but instead we read 16, 5, and then 9 characters at a time: \n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"\n",
"This \n",
"is line 2\n"
]
}
],
"source": [
"# Read certain amount of characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(16))\n",
" print(file1.read(5))\n",
" print(file1.read(9))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also read one line of the file at a time using the method <code>readline()</code>: \n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"first line: This is line 1 \n",
"\n"
]
}
],
"source": [
"# Read one line\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(\"first line: \" + file1.readline())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also pass an argument to <code> readline() </code> to specify the number of charecters we want to read. However, unlike <code> read()</code>, <code> readline()</code> can only read one line at most.\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"\n",
"This is line 2\n",
"This \n"
]
}
],
"source": [
"with open(example1, \"r\") as file1:\n",
" print(file1.readline(20)) # does not read past the end of line\n",
" print(file1.read(20)) # Returns the next 20 chars\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can use a loop to iterate through each line: \n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 0 : This is line 1 \n",
"\n",
"Iteration 1 : This is line 2\n",
"\n",
"Iteration 2 : This is line 3\n"
]
}
],
"source": [
"# Iterate through the lines\n",
"\n",
"with open(example1,\"r\") as file1:\n",
" i = 0;\n",
" for line in file1:\n",
" print(\"Iteration\", str(i), \": \", line)\n",
" i = i + 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the method <code>readlines()</code> to save the text file to a list: \n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# Read all lines and save as a list\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" FileasList = file1.readlines()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Each element of the list corresponds to a line of text:\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 3'"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the first line\n",
"\n",
"FileasList[0]\n",
"FileasList[2]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Print the second line\n",
"\n",
"FileasList[1]\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 3'"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the third line\n",
"\n",
"FileasList[2]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2> Exercise </h2>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h4>Weather Data</h4>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your friend, a rising star in the field of meterology, has called on you to write a script to perform some analysis on weather station data. Given below is a file \"resources/ex4.csv\", which contains some precipiation data for the month of June.\n",
"Each line in the file has the format - Date,Precipation (upto two decimal places). Note how the data is seperated using ','. The first row of the file contains headers and should be ignored.\n",
"\n",
"Your task is to complete the <code>getNAvg</code> function that computes a simple moving average for N days for the precipiation data, where N is a parameter. Your function should return a list of moving averages for the given data. \n",
"\n",
"The formula for a k day moving average over a series - $n_{0},n_{2},n_{3}....n_{m}$is:\n",
"\\begin{align}\n",
"M_{i} = M_{i-1} + \\frac{n_{i} - n_{i-k}}{k}, \\text{for i = k to m }\n",
"\\\\ \\text{where $M_{i}$ is the moving average}\n",
"\\end{align}\n",
"The skeleton code has been provided below. Edit only the required function.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details><summary>Click here for the solution</summary>\n",
"\n",
"```python\n",
"- Each line of the file has a '\\n' char which should be removed\n",
"- The lines in the file are read as strings and need to be typecasted to floats\n",
"- For a k day moving average, The data points for the last k days must be known\n",
" \n",
"```\n",
"\n",
"</details>\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2020-12-24 14:34:23-- https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/ex4.csv\n",
"Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 169.63.118.104\n",
"Connecting to cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|169.63.118.104|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 718 [text/csv]\n",
"Saving to: ‘ex4.csv’\n",
"\n",
"ex4.csv 100%[===================>] 718 --.-KB/s in 0s \n",
"\n",
"2020-12-24 14:34:24 (2.57 MB/s) - ‘ex4.csv’ saved [718/718]\n",
"\n"
]
}
],
"source": [
"##Download the file \n",
"\n",
"!wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/ex4.csv"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"statData =\"ex4.csv\"\n",
"\n",
"def getNAvg(file,N):\n",
" \"\"\"\n",
" file - File containting all the raw weather station data\n",
" N - The number of days to compute the moving average over\n",
" \n",
" Return a list of containg the moving average of all data points\n",
" \"\"\"\n",
" with open(\"ex4.csv\") as file:\n",
" newfile = []\n",
" moving_avg = []\n",
" next(file) \n",
" for line in file:\n",
" line = line.rstrip(\"\\n\")\n",
" line = float(line.split(',')[1])\n",
" newfile.append(line)\n",
" i = 0\n",
" window_size = N\n",
" while i < len(newfile)-window_size+1:\n",
" this_window = newfile[i:i+window_size]\n",
" window_avg = sum(this_window)/window_size\n",
" moving_avg.append(window_avg)\n",
" i += 1\n",
" return moving_avg\n",
"\n",
"def plotData(mean,N):\n",
" \"\"\" \n",
" mean - series to plot\n",
" N - parameter for legend\n",
" Plots running averages \n",
" \n",
" \"\"\"\n",
" mean = [round(x,3) for x in mean]\n",
" plt.plot(mean,label=str(N) + ' day average')\n",
" plt.xlabel('Day')\n",
" plt.ylabel('Precipiation')\n",
" plt.legend()\n",
" \n",
"\n",
" \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Once you have finished, you can you use the block below to plot your data\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plotData(getNAvg(statData,1),1)\n",
"plotData([0 for x in range(1,5)]+ getNAvg(statData,5),5)\n",
"plotData([0 for x in range(1,7)]+ getNAvg(statData,7),7)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use the code below to verify your progress -\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"getNAvg : \n",
"Test Passed\n"
]
}
],
"source": [
"avg5 =[4.18,4.78,4.34,4.72,5.48,5.84,6.84,6.76,6.74,5.46,4.18,2.74,2.52,2.02,2.16,2.82,2.92,4.36,4.74,5.12,5.34,6.4,6.56,6.1,5.74,5.62,4.26]\n",
"avg7 =[4.043,4.757,5.071,5.629,6.343,5.886,6.157,5.871,5.243,4.386,3.514,2.714,2.586,2.443,2.571,3.643,4.143,4.443,4.814,5.6,6.314,6.414,5.429,5.443,4.986]\n",
"\n",
"def testMsg(passed):\n",
" if passed:\n",
" return 'Test Passed'\n",
" else :\n",
" return ' Test Failed'\n",
"\n",
"print(\"getNAvg : \")\n",
"try:\n",
" sol5 = getNAvg(statData,5)\n",
" sol7 = getNAvg(statData,7)\n",
" \n",
" if(len(sol5)==len( avg5) and (len(sol7)==len(avg7))):\n",
" err5 = sum([abs(avg5[index] - sol5[index])for index in range(len(avg5))])\n",
" err7 = sum([abs(avg7[index] - sol7[index])for index in range(len(avg7))])\n",
" print(testMsg((err5 < 1) and (err7 <1)))\n",
" \n",
" else:\n",
" print(testMsg(false))\n",
"except NameError as e: \n",
" print('Error! Code: {c}, Message: {m}'.format(c = type(e).__name__, m = str(e)))\n",
"except:\n",
" print(\"An error occured. Recheck your function\")\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details><summary>Click here for the solution</summary>\n",
"\n",
"```python\n",
"import matplotlib.pyplot as plt\n",
"\n",
"statData =\"ex4.csv\"\n",
"\n",
"def getNAvg(file,N):\n",
" \"\"\"\n",
" file - File containting all the raw weather station data\n",
" N - The number of days to compute the moving average over\n",
" \n",
" Return a list of containg the moving average of all data points\n",
" \"\"\"\n",
" row = 0 # keep track of rows\n",
" lastN = [] # keep track of last N points\n",
" mean = [0] # running avg\n",
" \n",
" \n",
" with open(file,\"r\") as rawData: \n",
" for line in rawData:\n",
" if (row == 0): # Ignore the headers\n",
" row = row + 1\n",
" continue\n",
" \n",
" line = line.strip('\\n')\n",
" lineData = float(line.split(',')[1])\n",
" \n",
" if (row<=N): \n",
" lastN.append(lineData)\n",
" mean[0] = (lineData + mean[0]*(row-1))/row\n",
" else:\n",
" mean.append( mean[row - N -1]+ (lineData - lastN[0])/N)\n",
" lastN = lastN[1:]\n",
" lastN.append(lineData)\n",
" \n",
" row = row +1 \n",
" return mean\n",
" \n",
"def plotData(mean,N):\n",
" \"\"\" Plots running averages \"\"\"\n",
" mean = [round(x,3) for x in mean]\n",
" plt.plot(mean,label=str(N) + ' day average')\n",
" plt.xlabel('Day')\n",
" plt.ylabel('Precipiation')\n",
" plt.legend()\n",
" \n",
"\n",
" \n",
"plotData(getNAvg(statData,1),1)\n",
"plotData ([0 for x in range(1,5)]+ getNAvg(statData,5),5 )\n",
"plotData([0 for x in range(1,7)] + getNAvg(statData,7),7)\n",
"\n",
" \n",
"```\n",
"\n",
"</details>\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'_io.TextIOWrapper' object has no attribute 'rea'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-17-32c17ccaa356>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mfile\u001b[0m \u001b[0;34m=\u001b[0m\u001b[0;34m\"ex4.csv\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfile\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\"r\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mrawData\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mrawData\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrea\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m: '_io.TextIOWrapper' object has no attribute 'rea'"
]
}
],
"source": [
"file =\"ex4.csv\"\n",
"with open(file,\"r\") as rawData: \n",
" rawData.rea"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-success\">\n",
" Note: Files with sets of data seperated using ',' or other charecters are called '.csv' files.\n",
" They are a very common way to store data. Usually when dealing with them, An external library is used that does the nitpicky tasks for you. In fact, There are numerous libraries for statistical functions to. You will learn about such libraries later in the course. \n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"<h2>The last exercise!</h2>\n",
"<p>Congratulations, you have completed your first lesson and hands-on lab in Python. However, there is one more thing you need to do. The Data Science community encourages sharing work. The best way to share and showcase your work is to share it on GitHub. By sharing your notebook on GitHub you are not only building your reputation with fellow data scientists, but you can also show it off when applying for a job. Even though this was your first piece of work, it is never too early to start building good habits. So, please read and follow <a href=\"https://cognitiveclass.ai/blog/data-scientists-stand-out-by-sharing-your-notebooks/\" target=\"_blank\">this article</a> to learn how to share your work.\n",
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Author\n",
"\n",
"<a href=\"https://www.linkedin.com/in/joseph-s-50398b136/\" target=\"_blank\">Joseph Santarcangelo</a>\n",
"\n",
"## Other contributors\n",
"\n",
"<a href=\"www.linkedin.com/in/jiahui-mavis-zhou-a4537814a\">Mavis Zhou</a>\n",
"\n",
"## Change Log\n",
"\n",
"| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n",
"| ----------------- | ------- | ------------- | --------------------------------------------------------- |\n",
"| 2020-09-30 | 1.2 | Malika Singla | Weather Data dataset link added |\n",
"| 2020-09-30 | 1.1 | Arjun Swani | Added exericse \"Weather Data\" |\n",
"| 2020-09-30 | 1.0 | Arjun Swani | Added blurbs about closing files and read() vs readline() |\n",
"| 2020-08-26 | 0.2 | Lavanya | Moved lab to course repo in GitLab |\n",
"| | | | |\n",
"| | | | |\n",
"\n",
"<hr/>\n",
"\n",
"## <h3 align=\"center\"> © IBM Corporation 2020. All rights reserved. <h3/>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment