Skip to content

Instantly share code, notes, and snippets.

@joernhees
Created July 21, 2015 15:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joernhees/7c90de8ef1ab29aefb66 to your computer and use it in GitHub Desktop.
Save joernhees/7c90de8ef1ab29aefb66 to your computer and use it in GitHub Desktop.
rdflib hashing test
from multiprocessing import Process, Manager, Lock
import random
def f(d):
from rdflib import URIRef
for i in random.sample(range(1000), 1000):
k = URIRef('foo'+str(i))
with l:
if k not in d:
d[k] = 0
d[k] += 1
if __name__ == '__main__':
manager = Manager()
d = manager.dict()
l = Lock()
p1 = Process(target=f, args=(d,))
p2 = Process(target=f, args=(d,))
p1.start()
p2.start()
p1.join()
p2.join()
print d
assert len(d) == 1000
for v in d.values():
assert v == 2
@joernhees
Copy link
Author

This was an attempt to provoke a bug with unstable hashing of URIRef as in RDFLib/rdflib#500 by using URIRefs as keys of a dict from multiple processes.

The idea was that as different processes invoking hash(URIRef('foo')) return different hashes that if they shared a dict this could lead to items ending in different buckets of the dict. It seems though as if manager.dict() doesn't get tricked by this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment