Created
March 2, 2015 18:29
-
-
Save acrymble/b104e854248519ddf85e to your computer and use it in GitHub Desktop.
Gazetteer keyword searching
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#This Python programme will let you find pre-defined keywords in a series of entries. | |
#Written by Adam Crymble 2 March 2015, for Intro to Digital History class, University of Hertfordshire | |
#a.crymble@herts.ac.uk | |
#Import the keywords | |
f = open('keywords.txt', 'r') | |
allKeywords = f.read().lower().split("\n") | |
f.close() | |
#Import the texts you want to search | |
f = open('texts.txt', 'r') | |
allTexts = f.read().lower().split("\n") | |
f.close() | |
#Our programme: | |
for entry in allTexts: | |
matches = 0 | |
storedMatches = [] | |
#for each entry: | |
allWords = entry.split(' ') | |
for words in allWords: | |
#remove punctuation that will interfere with matching | |
words = words.replace(',', '') | |
words = words.replace('.', '') | |
words = words.replace(';', '') | |
#if a keyword match is found, store the result. | |
if words in allKeywords: | |
if words in storedMatches: | |
continue | |
else: | |
storedMatches.append(words) | |
matches += 1 | |
#if there is a stored result, print it out | |
if matches == 0: | |
print ' ' | |
else: | |
matchString = '' | |
for matches in storedMatches: | |
matchString = matchString + matches + "\t" | |
print matchString |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment