Skip to content

Instantly share code, notes, and snippets.

@hinnerk
Created March 7, 2012 12:20
Show Gist options
  • Save hinnerk/1992792 to your computer and use it in GitHub Desktop.
Save hinnerk/1992792 to your computer and use it in GitHub Desktop.
Timestamp collisions demo for this blog post: http://randnotizen.de/post/18898864701/will-openstack-swift-lose-data
import time
def test_time():
collisions = 0 # number of collisions
dist = {} # distinct values
microsecond = 0.000001
count = 0
while count < 1000000:
count += 1
t1 = time.time()
time.sleep(microsecond)
t2 = time.time()
s1 = "%016.05f" % t1
s2 = "%016.05f" % t2
if s1 == s2:
collisions += 1
else:
diff = t2 - t1
dist[diff] = dist.get(diff, 0) + 1
percent = collisions / (count / 100.0)
print "Collisions:\t%s (%s%%)" % (collisions, percent)
print "Min value:\t%s" % min(dist.iterkeys())
print "Max value:\t%s" % max(dist.iterkeys())
print "Dist values:\t%s" % len(dist)
if __name__ == '__main__':
test_time()
@hinnerk
Copy link
Author

hinnerk commented Mar 7, 2012

Sorry for the missing way to comment, I've just now enabled disqus for the blog.

Regarding Swift: It's quite possible that I've drawn a wrong conclusion from my cursory glance on the code. Let me elaborate with an example: Assuming somehow two (or more) different entries get stored under the same id and timestamp (which due to screwed clocks may happen even without any concurrency). Does Swift preserve all entries?

@gholt
Copy link

gholt commented Mar 7, 2012

Cool on the disqus thing, we should probably move over to that. :)

What exactly happens depends somehwat upon the number of replicas Swift is configured for. If we assume the standard 3 replicas, you could theoretically end up with 3 different "answers" to the question: What is the data in this object? But, since all three were at the exact same collision time, there's really no way to know which is most correct anyway.

Many systems use a content hash to resolve these conflicts at sync, but early in Swift's development it was decided that highly concurrent single object writes weren't a use case for Swift needing such extra resolution steps.

For Swift, it would keep those multiple versions until the hardware on one failed, in which case it would get one of the other versions, etc. until there was just one version again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment