Two major types of files that are stored on a computer
- Text Files
- Binary Files
- Contains lines of characters.
- Each line ends with of EOL (End Of Line) character.
- In Python, a common EOL character is
\n
. - A text file can be read by a program like notepad.
- All files that are not text files are binary files.
- To process a binary file, the file format must be known by the application.
- Examples of binary files include: pdf files, png files, doc files, exe file
To process a file (read or write to a file), the file must first be opened. When opening a file a minimum of one parameter must be supplied. The name of the file, if the mode parameter is not supplied, the file will be opened in read only mode.
Example:
open('assignmment.txt') # Default mode is Read mode.
file name
: this is the actual name of the file on the disk, including its file extension.
mode
: the mode is how you are going to process the file.
Here are 5 modes:
- 'r' : to read a file (the default mode)
- 'w' : to write to a file, the contents of the file are deleted before writing to the file
- 'x' : open for exclusive writing. If the file already exists, the open() statement will fail.
- 'a' : to append to a file, append will add data to the end of a file, keeping all of the original data
- '+' : read and write the the same file
- 'r+' : same as '+'
CAUTION: binary files can be corrupted in a Windows environment, if the file is not opened as a binary file
For binary files:
- 'b' : used to append binary files (needed in a Windows environment)
- 'rb', wb', 'r+b' are also used with binary files in a Window environment.
For more detail, see python.org documentation file.
file_object = open(filename, mode)
file_object
: the name that will be used in the program to process the file.
filename
: a string that contains the name of the actual file on the disk
mode
: a string that will be a 'r', 'w', 'a', 'b' or 'x'
Examples:
- For Reading :
f_input = open('myData.txt', 'r')
- For Writing:
f_output = open('newData.txt','w')
- For Reading and Writing:
f_output = open('newData.txt','rw+')
If you write only '+' sign below is the error you get.
ValueError: mode string must begin with one of 'r', 'w', 'a' or 'U', not '+'
- For Appending:
f_output = open('newData.txt','a')
When you open a file, it must also be closed. Closing a file after you are finished with it is the same for reading, writing or appending.
file_object.close()
file_object: the name that will be used in the program to process the file.
This will close the f_data file object:
f_data.close()
Better file Handling in python with with()
Statement
with
is a multi-line statement. All statements that are indented after the with statement are part of the with statement. When opening a file using a with statement, the file is automatically closed when the block of code is complete.
Therefore, there is no need to close the file after processing is complete. An error is not issued if the file is closed.
-
This will
open the myData.txt file
asf_in
for reading:with open('myData.txt', 'r') as f_in:
-
This will
open the newData.txt
file asf_out
for writing:with open('newData.txt','w')as f_out:
-
This will
open the myData.txt
file asf_in
for reading andnewData.txt
file asf_out
forwriting
all within the same block of code:
with open('myData.txt', 'r') as f_in, open('newData.txt','w')as f_out:
File Object Methods of Reading a Text File:
-
.read(): read the file once into one long string
When the
read()
method is called, it returns some data from the file. Ifread()
is not given a parameter, the entire file is read and placed into the variable that is assigned to it. If theEOF
(End Of File) has been reached,read()
will return an empty string""
(or '', either a pair of single or double quotes can be used)
Read an entire file in to a single variable:
fileToRead = 'textFile.txt'
f_in = open(fileToRead,'r')
print ('Reading file ' + fileToRead)
data = f_in.read()
# all the data in the file is now the variable data
# you can use any valid variable name in place of data
# this is where the data would be processed
print(data)
f_in.close()
Read an entire file in to a single variable removing the EOL characters at the same time:
# all EOL characters are removed and each line is placed in a list
f = open(fname, 'r')
lines = f.read().splitlines()
print (lines )
f.close()
-
.readline(): read the file one line at a time
When the
.readline()
method is called, it reads one line from a text file. The EOL (End Of Line) is represented by the\n
character. All lines in a text file contain an EOL character except the last line in the file. If the EOF (End Of File) has been reached,.readline()
will return an empty string""
This is a very traditional method of reading a file. the logic used here is call "priming the loop". The first line of the file is read before the while loop. The while loop checks to see if the EOF has been reached. Providiing the EOF has not been reached the data is processed and another line is read.
fname = 'textFile.txt'
f_input = open(fname,'r')
# f_input is the file object created by the open() statement
# read the first line of the file
one_line_of_data = f_input.read()
while '' != one_line_of_data: # keep looping while the EOF has not been read
print (one_line_of_data)
one_line_of_data = f_input.read()
print ('done')
f_input.close()
This method uses an implied readline(). The for statement does the reading and quits when the EOF is reached.
fname = 'textFile.txt'
f_input = open(fname,'r')
# implied reading line by line -- no readline required
# line is a variable and can be any valid variable name
# f_input is the file object created by the open() statement
for line in f_input:
# process the data in line here
# NOTICE that the data is printed double spaced.
# One of the EOL is in the file
# the other is created by the print statement
# the last line is not double spaced as the last line in the file
# does not have an EOL character
print (line)
print ('done')
f_input.close()
-
.readlines(): read all lines into a list, with each line as an element in the list.
When the
.readlines()
method is called, the entire file is read into a list with each line being one element of the list. If the EOF (End Of File) has been reached,.readlines()
will return an empty string""
fileToRead = 'textFile.txt'
f_in = open(fileToRead, 'r')
print('Reading file ' + fileToRead)
list_of_data = f_in.readlines()
# the data in the file can now be processed in the variable list_of_data
# the EOL \n is in the string
print(list_of_data)
print("done")
f_in.close()
- .write(): when the
.write()
method is called, the contents of the string variable is written to the file. It expects a string as argument and writes it to the file. If you provide a list of strings, it will raise an exception (by the way, show errors to us!)
Write keyboard input to a file. Stop input when no data is entered.
def writef(fname ='outfile.txt'):
f = open(fname, 'w')
line = input('Enter some data for a line: ')
while line != '':
f.write(line+ '\n')
line = input('Enter some data for a line: ')
f.close()
write() will raise an error, we cannot pass like below.
textdoc.write(line1 + "\n" + line2 + ....)
Instead we use writelines()
- writelines(): expects an iterable as argument (an iterable object can be a tuple, a list, a string, or an iterator in the most general sense). Each item contained in the iterator is expected to be a string.
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
f.writelines("%s\n" % l for l in lines)
open
: prepare a file for processingclose
: close a file after processingwith
: a code block to help with processing a file, closing a file object is implicit.