Skip to content

Instantly share code, notes, and snippets.

@acrymble
Created March 2, 2015 18:29
Show Gist options
  • Save acrymble/b104e854248519ddf85e to your computer and use it in GitHub Desktop.
Save acrymble/b104e854248519ddf85e to your computer and use it in GitHub Desktop.
Gazetteer keyword searching
#This Python programme will let you find pre-defined keywords in a series of entries.
#Written by Adam Crymble 2 March 2015, for Intro to Digital History class, University of Hertfordshire
#a.crymble@herts.ac.uk
#Import the keywords
f = open('keywords.txt', 'r')
allKeywords = f.read().lower().split("\n")
f.close()
#Import the texts you want to search
f = open('texts.txt', 'r')
allTexts = f.read().lower().split("\n")
f.close()
#Our programme:
for entry in allTexts:
matches = 0
storedMatches = []
#for each entry:
allWords = entry.split(' ')
for words in allWords:
#remove punctuation that will interfere with matching
words = words.replace(',', '')
words = words.replace('.', '')
words = words.replace(';', '')
#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1
#if there is a stored result, print it out
if matches == 0:
print ' '
else:
matchString = ''
for matches in storedMatches:
matchString = matchString + matches + "\t"
print matchString
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment