Skip to content

Instantly share code, notes, and snippets.

@Sparker0i
Created June 28, 2019 10:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Sparker0i/561fce8e7be0f7c11c15e05590e11785 to your computer and use it in GitHub Desktop.
Save Sparker0i/561fce8e7be0f7c11c15e05590e11785 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1>Reading Files Python</h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p><strong>Welcome!</strong> This notebook will teach you about reading the text file in the Python Programming Language. By the end of this lab, you'll know how to read text files.</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Table of Contents</h2>\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
" <ul>\n",
" <li><a href=\"download\">Download Data</a></li>\n",
" <li><a href=\"read\">Reading Text Files</a></li>\n",
" <li><a href=\"better\">A Better Way to Open a File</a></li>\n",
" </ul>\n",
" <p>\n",
" Estimated time needed: <strong>40 min</strong>\n",
" </p>\n",
"</div>\n",
"\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"download\">Download Data</h2>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-06-28 10:56:13-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.193\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.193|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 45 [text/plain]\n",
"Saving to: ‘/resources/data/Example1.txt’\n",
"\n",
"/resources/data/Exa 100%[===================>] 45 --.-KB/s in 0s \n",
"\n",
"2019-06-28 10:56:14 (20.7 MB/s) - ‘/resources/data/Example1.txt’ saved [45/45]\n",
"\n"
]
}
],
"source": [
"# Download Example file\n",
"\n",
"!wget -O /resources/data/Example1.txt https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"read\">Reading Text Files</h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One way to read or write a file in Python is to use the built-in <code>open</code> function. The <code>open</code> function provides a <b>File object</b> that contains the methods and attributes you need in order to read, save, and manipulate the file. In this notebook, we will only cover <b>.txt</b> files. The first parameter you need is the file path and the file name. An example is shown as follow:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadOpen.png\" width=\"500\" />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The mode argument is optional and the default value is <b>r</b>. In this notebook we only cover two modes: \n",
"<ul>\n",
" <li><b>r</b> Read mode for reading files </li>\n",
" <li><b>w</b> Write mode for writing files</li>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the next example, we will use the text file <b>Example1.txt</b>. The file is shown as follow:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadFile.png\" width=\"200\" />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We read the file: "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Read the Example1.txt\n",
"\n",
"example1 = \"/resources/data/Example1.txt\"\n",
"file1 = open(example1, \"r\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can view the attributes of the file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The name of the file:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'/resources/data/Example1.txt'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the path of file\n",
"\n",
"file1.name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The mode the file object is in:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'r'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the mode of file, either 'r' or 'w'\n",
"\n",
"file1.mode"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can read the file and assign it to a variable :"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 1 \\nThis is line 2\\nThis is line 3'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Read the file\n",
"\n",
"FileContent = file1.read()\n",
"FileContent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The <b>/n</b> means that there is a new line. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can print the file: "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# Print the file with '\\n' as a new line\n",
"\n",
"print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file is of type string:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Type of file content\n",
"\n",
"type(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We must close the file object:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Close file after finish\n",
"\n",
"file1.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"better\">A Better Way to Open a File</h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the <code>with</code> statement is better practice, it automatically closes the file even if the code encounters an exception. The code will run everything in the indent block then close the file object. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# Open file using with\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" FileContent = file1.read()\n",
" print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file object is closed, you can verify it by running the following cell: "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Verify if the file is closed\n",
"\n",
"file1.closed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can see the info in the file:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"# See the content of file\n",
"\n",
"print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The syntax is a little confusing as the file object is after the <code>as</code> statement. We also don’t explicitly close the file. Therefore we summarize the steps in a figure:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadWith.png\" width=\"500\" />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We don’t have to read the entire file, for example, we can read the first 4 characters by entering three as a parameter to the method **.read()**:\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This\n"
]
}
],
"source": [
"# Read first four characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the method <code>.read(4)</code> is called the first 4 characters are called. If we call the method again, the next 4 characters are called. The output for the following cell will demonstrate the process for different inputs to the method <code>read()</code>:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This\n",
" is \n",
"line 1 \n",
"\n",
"This is line 2\n"
]
}
],
"source": [
"# Read certain amount of characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(4))\n",
" print(file1.read(4))\n",
" print(file1.read(7))\n",
" print(file1.read(15))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The process is illustrated in the below figure, and each color represents the part of the file read after the method <code>read()</code> is called:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadChar.png\" width=\"500\" />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Here is an example using the same file, but instead we read 16, 5, and then 9 characters at a time: "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"\n",
"This \n",
"is line 2\n"
]
}
],
"source": [
"# Read certain amount of characters\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(file1.read(16))\n",
" print(file1.read(5))\n",
" print(file1.read(9))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also read one line of the file at a time using the method <code>readline()</code>: "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"first line: This is line 1 \n",
"\n"
]
}
],
"source": [
"# Read one line\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" print(\"first line: \" + file1.readline())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can use a loop to iterate through each line: \n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 0 : This is line 1 \n",
"\n",
"Iteration 1 : This is line 2\n",
"\n",
"Iteration 2 : This is line 3\n"
]
}
],
"source": [
"# Iterate through the lines\n",
"\n",
"with open(example1,\"r\") as file1:\n",
" i = 0;\n",
" for line in file1:\n",
" print(\"Iteration\", str(i), \": \", line)\n",
" i = i + 1;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the method <code>readlines()</code> to save the text file to a list: "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Read all lines and save as a list\n",
"\n",
"with open(example1, \"r\") as file1:\n",
" FileasList = file1.readlines()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Each element of the list corresponds to a line of text:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 1 \\n'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the first line\n",
"\n",
"FileasList[0]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 2\\n'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the second line\n",
"\n",
"FileasList[1]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 3'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the third line\n",
"\n",
"FileasList[2]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment