Skip to content

Instantly share code, notes, and snippets.

@kurianbenoy
Created April 30, 2020 05:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kurianbenoy/c0f9053f9b5a7f41f9faf6b84aeb7a35 to your computer and use it in GitHub Desktop.
Save kurianbenoy/c0f9053f9b5a7f41f9faf6b84aeb7a35 to your computer and use it in GitHub Desktop.
import random
def find_pairs(M1, M2, val):
s =list()
for i in range(0,1000000):
s.append(M1[i])
diff = {1000000 -n:n for n in M1}
pairs = []
# diff = []
# for i in range(0,1000000):
# diff.append(1000000 - M1[i])
# print(diff)
# print(M2[i])
# pairs = []
# for i in range(0,1000000):
# if(diff[i]==M2[i]):
# pairs.append((diff[i], M1[i]))
for n in M2:
if n in diff:
pairs.append((diff[n], n))
# print(pairs)
return pairs
def generate_files(filename,size):
random.seed()
with open(filename, 'w') as f:
for _ in range(size):
f.write(f"{random.randint(10,1000000)}\n")
def find_pairs_files(fname1, fname2):
M1 = []
M2 = []
with open(fname1) as f:
for line in f:
line = line.strip()
M1.append(int(line))
with open(fname2) as f:
for line in f:
line = line.strip()
M2.append(int(line))
pairs = find_pairs(M1,M2,1000000)
print(f'Found {len(pairs)} pairs')
print(len(M1), len(M2))
if __name__ == "__main__":
find_pairs_files('random1.txt','random2.txt')
@pythonhacker
Copy link

pythonhacker commented Apr 30, 2020

Much better. Still a few warts

find_pairs

  1. The list "s" is not used and that entire code is a time-waster. It should be removed.
  2. Remove all dead code. Dead code takes up space, contributes nothing and finds it difficult to read rest of the code.
  3. Again why have val as an argument if you are using 1000000 as a literal in the code ? Use either not both. I suggest changing the hard-coded 1000000 to val.
  4. Argument naming - Is val a good name ? Isn't total better for example ?
  5. Argument naming - In Python we rarely ever use capitalized variable names. It should be ideally small-case. Use snake_case and let the variables have some context in ther naming. So in this case, num_1, num_2 or nums_1, nums_2 are better than say even m1 or m2.

Refer here for different casing approaches: https://medium.com/better-programming/string-case-styles-camel-pascal-snake-and-kebab-case-981407998841

find_pairs_files

  1. Same feedback on M1 and M2 apply here as well.
  2. Doing int(...) on a string drops its new-line as well. So no need to do strip() - you save some processing time and your code runs faster. Try it.

Overall

  1. Too much extra white-space between functions. Trim it to two or three newlines max.
  2. Add doc-strings to functions.

Right now your code rating is still about 6.5/10 for me - fixing the variable names and removing dead code would take it up to 8/10. Dropping extra white space would make it 9/10 and adding doc-strings and fixing the logic to increase speed will make it 9.5/10 .

For reference here is my solution - https://gist.github.com/pythonhacker/1cf74fb0f1ea3f6f99bc94566651958e

Timing

Here is your code timing (keeping the redundant "s" list) (For timing copy the function timer_test from my code and modify it accordingly.

$ python soln3.py 
Time per pass => 1.340240458900007

After commenting out the "s" code,

$ python soln3.py 
Time per pass => 1.2040603387000375

Hope the feedback makes you a better programmer. All the best.

@kurianbenoy
Copy link
Author

kurianbenoy commented May 1, 2020

$ python3 hashing.py
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
Found 632844 pairs
TIme per pass => 1.3160592321000877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment