Skip to content

Instantly share code, notes, and snippets.

@horstjens
Created November 24, 2012 21:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save horstjens/4141498 to your computer and use it in GitHub Desktop.
Save horstjens/4141498 to your computer and use it in GitHub Desktop.
searching strings in files with python 3.2
# to test this , create a file called poem.txt in the same folder as this python file
# make sure there are a lot of "more" in the poem.txt like
poem = """there is more to the world
than Demi Moore and Roger Mooore
It is a good Morning, but more so
a good day to every moron out there,
gimme more, more, moreofit"""
# i assume you are using python3.2 here
# open file as fileoobject f
f = open("poem.txt","r") # open in (r)ead mode (default)
# get a big list of all the lines in the file
lines = f.readlines()
# close the file, keep the lines list only
f.close()
# define what you are looking for, the searchstring
mysearchstring = "more" # make double or single quotes, but do not mix them
# iterate over all the lines
linenumber = 0
counter = 0
currentline = ""
for line in lines:
linenumber += 1 # shortcut for: linenumber = linenumber +1
startpos = 0
endpos = 0
for foundpos in range(line.count(mysearchstring)):
counter += 1
# find out at what position exactly
startpos = line[endpos:].find(mysearchstring) # see comments below
endpos += startpos + len(mysearchstring)
print("found '{}' in line:{} pos:{}:\n{}".format(mysearchstring, linenumber, endpos-len(mysearchstring), line))
print("end of line {}".format(linenumber)) # this is after the end of the while loop
print("end of search") # this line comes after the end of the for loop
# --- what i did here --
# ok, this code could be more elegant etc.
# the first loop, the for loop, processes each line of the list lines
# "iterating" over all the elements in the list lines. also the variable
# linenumber is counting the number of lines processed so far.
# at each line, startpos and endpos get resetted to the value zero (0)
# now the interesting bit: we don't know how many searchstrings are in
# the current line. - actually, we can ask python , using the .count() method:
# line.count(searchstring). This returns an integer value
# to loop over as many times, i use the range function. range(3) by examples
# creates a list with 3 items: [0,1,2] ( to play in python directmode, use
# list(range(3)) and see what happens. this list is iterable with a for loop
# the strange colons inside the square brackets are slicing commands.
# let say the line is "abcdef" then line[2:5] would return "cde" .. python
# counts all chars: 012345 starting with 0. so the startvalue 2 is a "c"
# the stopo value 5 (the "f") is not returned
# by manipulation startpos and endpos and increasing endpos, i force python
# to use the find command not at the whole line but only at the not-yet-searched
# remainder of the line
#
there is more to the world
than Demi Moore and Roger Mooore
It is a good Morning, but more so
a good day to every moron out there,
gimme more, more, moreofit
@yipyip
Copy link

yipyip commented Nov 25, 2012

poem = """there is more to the world
than Demi Moore and Roger Mooore
It is a good Morning, but more so
a good day to every moron out there,
gimme more, more, moreofit"""

lines = poem.split("\n")
print lines

item = "more"
length = len(item)
counter = 0
for i, line in enumerate(lines):
k = 0
for _ in xrange(line.count(item)):
counter += 1
j = line.find(item, k)
k = j + length
print "counter", counter, "line", i, "pos", j

@yipyip
Copy link

yipyip commented Nov 25, 2012

Ups, no indentation.
See https://gist.github.com/4144425

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment