Created
May 1, 2017 00:21
Answer to question #43711418 - MBasith created by metajoker - https://repl.it/H8r1/3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Apples: 5 items in stock | |
Pears: 10 items in stock | |
Bananas: 15 items in stock |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Watermelon: 20 items in stock | |
Pears: 25 items in stock | |
Oranges: 30 items in stock |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Apples: 0 items in stock | |
Pears: 0 items in stock | |
Bananas: 0 items in stock | |
Watermelon: 0 items in stock | |
Pears: 1 items in stock | |
Oranges: 0 items in stock |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def _get_key(string, delim): | |
#Split key out of string | |
key=string.split(delim)[0].strip() | |
return key | |
def _clean_string(string, charToReplace): | |
#Remove garbage from string | |
for character in charToReplace: | |
string=string.replace(character,'') | |
#Strip leading and trailing whitespace | |
string=string.strip() | |
return string | |
def get_matching_key_values(file_1, file_2, delim, charToReplace): | |
#Open the files to be compared | |
with open(file_1, 'r') as a, open(file_2, 'r') as b: | |
#Create an object to hold our matches | |
matches=[] | |
#Iterate over file 'a' and extract the keys, one-at-a-time | |
for lineA in a: | |
keyA=_get_key(lineA, delim) | |
#Iterate over file 'b' and extract the keys, one-at-a-time | |
for lineB in b: | |
keyB=_get_key(lineB, delim) | |
#Compare the keys. You might need upper, but I usually prefer | |
#to compare all uppercase to all uppercase | |
if keyA.upper()==keyB.upper(): | |
cleanedOutput=(_clean_string(lineA, charToReplace), _clean_string(lineB, charToReplace)) | |
matches.append(cleanedOutput) | |
#Reset file 'b' pointer to start of file and try again | |
b.seek(0) | |
#Return our final list of matches | |
#--NOTE: this method CAN return an empty 'matches' object! | |
return matches | |
if __name__=="__main__": | |
def format_output (output): | |
return '\n'.join(map(str, output)) | |
#Test of fn against provided file_1 and file_2 | |
print("###############################################\nTest case #1") | |
print(format_output(get_matching_key_values('./file_1.txt', './file_2.txt', ':', ['\n', '\r']))) | |
print("###############################################") | |
print('\n') | |
#Test of fn against provided file_1 and created file_3 | |
print("###############################################\nTest case #2") | |
print(format_output(get_matching_key_values('./file_1.txt', './file_3.txt', ':', ['\n', '\r']))) | |
print("###############################################") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Original post (http://stackoverflow.com/a/43712831/6476525):
I would actually heavily suggest against storing data in 1GB sized text files and not in some sort of database/standard data storage file format. If your data were more complex, I'd suggest CSV or some sort of delimited format at minimum. If you can split and store the data in much smaller chunks, maybe a markup language like XML, HTML, or JSON (which would make navigation and extraction of data easy) which are far more organized and already optimized to handle what you're trying to do (locating matching keys and returning their values).
That said, you could use the "readline" method found in section 7.2.1 of the Python 3 docs to efficiently do what you're trying to do: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-file.
Or, you could just iterate over the file:
This is not really the best/most efficient way to go about this:
lines in file 'a'. Ideally, you would only iterate over each file once.
Even only using base Python, I'm sure there is a better way to go about it.
For the Gist: https://gist.github.com/MetaJoker/a63f8596d1084b0868e1bdb5bdfb5f16
I think the Gist also has a link to the repl.it I used to write and test the code if you want a copy to play with in your browser.