Skip to content

Instantly share code, notes, and snippets.

@timehaven
Created July 19, 2017 15:29
Show Gist options
  • Save timehaven/4cb197ab8618ccbba388eec79e646455 to your computer and use it in GitHub Desktop.
Save timehaven/4cb197ab8618ccbba388eec79e646455 to your computer and use it in GitHub Desktop.
def file_path_from_db_id(db_id, pattern="blah_%d.png", top="/path/to/imgs"):
"""Return file path /top/yyy/xx/blah_zzzxxyyy.png for db_id zzzxxyyy.
The idea is to hash into 1k top level dirs, 000 - 999, then 100
second level dirs, 00-99, so that the following database ids
result in the associated file paths:
1234567 /path/to/imgs/567/34/blah_1234567.png
432 /path/to/imgs/432/00/blah_432.png
29847 /path/to/imgs/847/29/blah_29847.png
1432 /path/to/imgs/432/01/blah_1432.png
Notice that changing pattern to pattern="blah_%09d.png" and
top="" would result in:
1234567 567/34/blah_001234567.png
432 432/00/blah_000000432.png
29847 847/29/blah_000029847.png
1432 432/01/blah_000001432.png
In general, this will give a decent spread for up to 100 million images.
If you have more than 100 million images, or your database ids are
higher, then this function is easily modified.
"""
s = '%09d' % db_id
return os.path.join(top, s[-3:], s[-5:-3], pattern % db_id)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment