Skip to content

Instantly share code, notes, and snippets.

@icychocolate98
Created January 14, 2022 18:07
Show Gist options
  • Save icychocolate98/6372c81c26d8a2b0832b39419df2d01d to your computer and use it in GitHub Desktop.
Save icychocolate98/6372c81c26d8a2b0832b39419df2d01d to your computer and use it in GitHub Desktop.
Python Work With Files Intro
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
" <img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/IDSNlogo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n",
"</center>\n",
"\n",
"# Reading Files Python\n",
"\n",
"Estimated time needed: **40** minutes\n",
"\n",
"## Objectives\n",
"\n",
"After completing this lab you will be able to:\n",
"\n",
"* Read text files using Python libraries\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Table of Contents</h2>\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
" <ul>\n",
" <li><a href=\"https://download/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0101ENSkillsNetwork19487395-2021-01-01\">Download Data</a></li>\n",
" <li><a href=\"https://read/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0101ENSkillsNetwork19487395-2021-01-01\">Reading Text Files</a></li>\n",
" <li><a href=\"https://better/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0101ENSkillsNetwork19487395-2021-01-01\">A Better Way to Open a File</a></li>\n",
" </ul>\n",
"\n",
"</div>\n",
"\n",
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"download\">Download Data</h2>\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('Example1.txt', <http.client.HTTPMessage at 0x7f42083312d0>)"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import urllib.request\n",
"url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/data/example1.txt'\n",
"filename = 'Example1.txt'\n",
"urllib.request.urlretrieve(url, filename)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/resources/data/Example1.txt: No such file or directory\n"
]
}
],
"source": [
"# Download Example file\n",
"\n",
"\n",
"!wget -O /resources/data/Example1.txt https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/data/example1.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"read\">Reading Text Files</h2>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One way to read or write a file in Python is to use the built-in <code>open</code> function. The <code>open</code> function provides a **File object** that contains the methods and attributes you need in order to read, save, and manipulate the file. In this notebook, we will only cover **.txt** files. The first parameter you need is the file path and the file name. An example is shown as follow:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/ReadOpen.png\" width=\"500\" />\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The mode argument is optional and the default value is **r**. In this notebook we only cover two modes:\n",
"\n",
"<ul>\n",
" <li>**r**: Read mode for reading files </li>\n",
" <li>**w**: Write mode for writing files</li>\n",
"</ul>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the next example, we will use the text file **Example1.txt**. The file is shown as follows:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/ReadFile.png\" width=\"100\" />\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We read the file:\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Read the Example1.txt\n",
"\n",
"example1 = \"Example1.txt\"\n",
"file1 = open(example1, \"r\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can view the attributes of the file.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The name of the file:\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Example1.txt'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the path of file\n",
"\n",
"file1.name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The mode the file object is in:\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'r'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the mode of file, either 'r' or 'w'\n",
"\n",
"file1.mode"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can read the file and assign it to a variable :\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 1 \\nThis is line 2\\nThis is line 3'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Read the file\n",
"\n",
"FileContent = file1.read()\n",
"FileContent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The **/n** means that there is a new line.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can print the file:\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# Print the file with '\\n' as a new line\n",
"\n",
"print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file is of type string:\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Type of file content\n",
"\n",
"type(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is very important that the file is closed in the end. This frees up resources and ensures consistency across different python versions.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Close file after finish\n",
"\n",
"file1.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"better\">A Better Way to Open a File</h2>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the <code>with</code> statement is better practice, it automatically closes the file even if the code encounters an exception. The code will run everything in the indent block then close the file object.\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# Open file using with\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" FileContent = file1.read()\n",
" print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file object is closed, you can verify it by running the following cell:\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Verify if the file is closed\n",
"\n",
"file1.closed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see the info in the file:\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# See the content of file\n",
"\n",
"print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The syntax is a little confusing as the file object is after the <code>as</code> statement. We also don’t explicitly close the file. Therefore we summarize the steps in a figure:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/ReadWith.png\" width=\"500\" />\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We don’t have to read the entire file, for example, we can read the first 4 characters by entering three as a parameter to the method **.read()**:\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This\n"
]
}
],
"source": [
"# Read first four characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the method <code>.read(4)</code> is called the first 4 characters are called. If we call the method again, the next 4 characters are called. The output for the following cell will demonstrate the process for different inputs to the method <code>read()</code>:\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This\n",
" is \n",
"line 1 \n",
"\n",
"This is line 2\n"
]
}
],
"source": [
"# Read certain amount of characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(4))\n",
" print(file1.read(4))\n",
" print(file1.read(7))\n",
" print(file1.read(15))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The process is illustrated in the below figure, and each color represents the part of the file read after the method <code>read()</code> is called:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/read.png\" width=\"500\" />\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is an example using the same file, but instead we read 16, 5, and then 9 characters at a time:\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"\n",
"This \n",
"is line 2\n"
]
}
],
"source": [
"# Read certain amount of characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(16))\n",
" print(file1.read(5))\n",
" print(file1.read(9))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also read one line of the file at a time using the method <code>readline()</code>:\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"first line: This is line 1 \n",
"\n"
]
}
],
"source": [
"# Read one line\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(\"first line: \" + file1.readline())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also pass an argument to <code> readline() </code> to specify the number of charecters we want to read. However, unlike <code> read()</code>, <code> readline()</code> can only read one line at most.\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"\n",
"This is line 2\n",
"This \n"
]
}
],
"source": [
"with open(example1, \"r\") as file1:\n",
" print(file1.readline(20)) # does not read past the end of line\n",
" print(file1.read(20)) # Returns the next 20 chars\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use a loop to iterate through each line:\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 0 : This is line 1 \n",
"\n",
"Iteration 1 : This is line 2\n",
"\n",
"Iteration 2 : This is line 3\n"
]
}
],
"source": [
"# Iterate through the lines\n",
"\n",
"with open(example1,\"r\") as file1:\n",
" i = 0;\n",
" for line in file1:\n",
" print(\"Iteration\", str(i), \": \", line)\n",
" i = i + 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the method <code>readlines()</code> to save the text file to a list:\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# Read all lines and save as a list\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" FileasList = file1.readlines()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each element of the list corresponds to a line of text:\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 1 \\n'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the first line\n",
"\n",
"FileasList[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Print the second line\n",
"\n",
"FileasList\\[1]\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 3'"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the third line\n",
"\n",
"FileasList[2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"<h2>The last exercise!</h2>\n",
"<p>Congratulations, you have completed your first lesson and hands-on lab in Python.\n",
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Author\n",
"\n",
"<a href=\"https://www.linkedin.com/in/joseph-s-50398b136/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0101ENSkillsNetwork19487395-2021-01-01\" target=\"_blank\">Joseph Santarcangelo</a>\n",
"\n",
"## Other contributors\n",
"\n",
"<a href=\"https://www.linkedin.com/in/jiahui-mavis-zhou-a4537814a?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0101ENSkillsNetwork19487395-2021-01-01\">Mavis Zhou</a>\n",
"\n",
"## Change Log\n",
"\n",
"| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n",
"| ----------------- | ------- | ------------- | --------------------------------------------------------- |\n",
"| 2022-01-10 | 2.1 | Malika | Removed the readme for GitShare |\n",
"| 2020-09-30 | 1.3 | Malika | Deleted exericse \"Weather Data\" |\n",
"| 2020-09-30 | 1.2 | Malika Singla | Weather Data dataset link added |\n",
"| 2020-09-30 | 1.1 | Arjun Swani | Added exericse \"Weather Data\" |\n",
"| 2020-09-30 | 1.0 | Arjun Swani | Added blurbs about closing files and read() vs readline() |\n",
"| 2020-08-26 | 0.2 | Lavanya | Moved lab to course repo in GitLab |\n",
"| | | | |\n",
"| | | | |\n",
"\n",
"<hr/>\n",
"\n",
"## <h3 align=\"center\"> © IBM Corporation 2020. All rights reserved. <h3/>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment