Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save prafullkotecha/45080f6eedb1b59ab301797fec14948d to your computer and use it in GitHub Desktop.
Save prafullkotecha/45080f6eedb1b59ab301797fec14948d to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a href=\"http://cocl.us/topNotebooksPython101Coursera\"><img src = \"https://ibm.box.com/shared/static/yfe6h4az47ktg2mm9h05wby2n7e8kei3.png\" width = 750, align = \"center\"></a>\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png\" width = 300, align = \"center\"></a>\n",
"\n",
"\n",
"\n",
"<h1 align=center><font size = 5>Reading Files Python </font></h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook will provide information regarding reading **.txt** files."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents\n",
"\n",
"\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
"\n",
"<li><a href=\"#ref1\">Reading Text Files</a></li>\n",
"\n",
"<br>\n",
"<p></p>\n",
"Estimated Time Needed: <strong>15 min</strong>\n",
"</div>\n",
"\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a id=\"ref1\"></a>\n",
"<h2 align=center>Reading Text Files</h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One way to read or write a file in Python is to use the built-in **open** function. The **open** function provides a **File object** that contains the methods and attributes you need in order to read, save, and manipulate the file. In this notebook, we will only cover **.txt** files. The first parameter you need is the file path and the file name. An example is shown in __Figure 1__:\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/6wl3vw4ghflafrou0noj70t2n4hbalqr.png\" width = 500, align = \"center\"></a>\n",
" <h4 align=center> \n",
" Figure 1: Labeled Syntax of a file object. \n",
"\n",
" </h4> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The mode argument is optional and the default value is **r**. In this notebook we only cover two modes: \n",
"\n",
"<li>**r** Read mode for reading files </li>\n",
"<li>**w** Write mode for writing files</li>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" For the next example, we will use the text file **Example1.txt**. The file is shown in figure 2:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/ilzy3av6x1cd3gi61bq2nq0vxb0awhju.png\" width = 200, align = \"center\"></a>\n",
" <h4 align=center> \n",
" Figure 2: The text file \"Example1.txt\".\n",
"\n",
" </h4> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" First we load the file into the directory: "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-01-13 17:27:49-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.193\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.193|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 45 [text/plain]\n",
"Saving to: ‘/resources/data/Example1.txt’\n",
"\n",
"/resources/data/Exa 100%[===================>] 45 --.-KB/s in 0s \n",
"\n",
"2019-01-13 17:27:49 (6.69 MB/s) - ‘/resources/data/Example1.txt’ saved [45/45]\n",
"\n"
]
}
],
"source": [
"!wget -O /resources/data/Example1.txt https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" <a href=\"http://cocl.us/object_storage_corsera\"><img src = \"https://ibm.box.com/shared/static/6qbj1fin8ro0q61lrnmx2ncm84tzpo3c.png\" width = 750, align = \"center\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We read the file: "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"example1=\"/resources/data/Example1.txt\"\n",
"file1 = open(example1,\"r\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can view the attributes of the file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The name of the file:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'/resources/data/Example1.txt'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"file1.name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The mode the file object is in:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'r'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"file1.mode"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can read the file and assign it to a variable :"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 1 \\nThis is line 2\\nThis is line 3'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FileContent=file1.read()\n",
"FileContent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The “\\n” tells python that there is a new line. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can print the file: "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file is of type string:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We must close the file object:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"file1.close()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<_io.TextIOWrapper name='/resources/data/Example1.txt' mode='r' encoding='UTF-8'>"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"file1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <h3> A Better Way to Open a File </h3>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the **with** statement is better practice, it automatically closes the file even if the code encounters an exception. The code will run everything in the indent block then close the file object. \n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"with open(example1,\"r\") as file1:\n",
" FileContent=file1.read()\n",
" print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file object is closed, you can verify it by running the following cell: "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"file1.closed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can see the info in the file:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"This is line 2\n",
"This is line 3\n"
]
}
],
"source": [
"print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The syntax is a little confusing as the file object is after the **as** statement. We also don’t explicitly close the file. Therefore we summarise the steps in a figure:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/ywul1ji1ld82xwz60ljxvbg6fs2vrunm.png\" width = 500, align = \"center\"></a>\n",
" <h4 align=center> \n",
" The syntax for opening a file using a 'with' statement.\n",
"\n",
" </h4> "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['This is line 1 \\n', 'This is line 2\\n', 'This is line 3']\n"
]
}
],
"source": [
"with open(example1,\"r\") as file1:\n",
" FileContent=file1.readlines()\n",
" print(FileContent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We don’t have to read the entire file, for example, we can read the first 4 characters by entering three as a parameter to the method **.read()**:\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This\n"
]
}
],
"source": [
"with open(example1,\"r\") as file1:\n",
" print(file1.read(4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the method **.read(4)** is called the first 4 characters are called. If we call the method again, the next 4 characters are called. The output for the following cell will demonstrate the process for different inputs to the method **read() **:\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This\n",
" is \n",
"line 1 \n",
"\n",
"This is line 2\n"
]
}
],
"source": [
"with open(example1,\"r\") as file1:\n",
" print(file1.read(4))\n",
" print(file1.read(4))\n",
" print(file1.read(7))\n",
" print(file1.read(15))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The process is illustrated in the below figure, and each colour represents the part of the file read after the method **read()** is called:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/s0xs6y4vcvabp2ll2pwspa6kd8qeoddj.png\" width = 500, align = \"center\"></a>\n",
" <h4 align=center> \n",
" Illustration using the method **.read()** to call different characters \n",
"\n",
" </h4> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Here is an example using different values: "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is line 1 \n",
"\n",
"This \n",
"is line 2\n"
]
}
],
"source": [
"with open(example1,\"r\") as file1:\n",
" print(file1.read(16))\n",
" print(file1.read(5))\n",
" print(file1.read(9))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also read one line of the file at a time using the method **readline()**: "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"first line: This is line 1 \n",
"\n"
]
}
],
"source": [
" with open(example1,\"r\") as file1:\n",
" print(\"first line: \" + file1.readline())\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can use a loop to iterate through each line: \n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 0 : This is line 1 \n",
"\n",
"Iteration 1 : This is line 2\n",
"\n",
"Iteration 2 : This is line 3\n"
]
}
],
"source": [
" with open(example1,\"r\") as file1:\n",
" i=0;\n",
" for line in file1:\n",
" print(\"Iteration\" ,str(i),\":\",line)\n",
" i=i+1;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the method **readline()** to save the text file to a list: "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"with open(example1,\"r\") as file1:\n",
" FileasList=file1.readlines()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Each element of the list corresponds to a line of text:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 1 \\n'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FileasList[0]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 2\\n'"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FileasList[1]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'This is line 3'"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FileasList[2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a href=\"http://cocl.us/bottemNotebooksPython101Coursera\"><img src = \"https://ibm.box.com/shared/static/irypdxea2q4th88zu1o1tsd06dya10go.png\" width = 750, align = \"center\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"### About the Author: \n",
" [Joseph Santarcangelo]( https://www.linkedin.com/in/joseph-s-50398b136/) has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <hr>\n",
"Copyright &copy; 2017 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment