Skip to content

Instantly share code, notes, and snippets.

@julian-klode
Last active September 24, 2016 17:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save julian-klode/b97ac3e9f22f0b2805ece04b254ddc3f to your computer and use it in GitHub Desktop.
Save julian-klode/b97ac3e9f22f0b2805ece04b254ddc3f to your computer and use it in GitHub Desktop.
Generator for an order-preserving minimal perfect hash function
Package
Version
Architecture
Multi-Arch
Section
Source
Size
Installed-Size
Priority
Depends
Pre-Depends
Conflicts
Breaks
Recommends
Suggests
Replaces
Enhances
Optional
Description
Description-md5
Essential
Important
Status
Provides
SHA256
SHA512
MD5sum
Maintainer
Filename
Tag
Homepage
SHA1
Pin-Priority
Pin
Codename
import collections, sys
class Trie(object):
count = 0
def __init__(self):
self.key = None
self.value = None
self.children = dict()
def insert(self, key, k):
if not key:
self.key = key
Trie.count += 1
self.value = Trie.count
return
if key[0] not in self.children:
self.children[key[0]] = Trie()
self.children[key[0]].insert(key[1:], k)
def print_table(self, index=0):
print(index * " ", "switch(%d < length ? string[%d] : 0) {" % (index, index))
for key, sub in sorted(self.children.items()):
print(index * " ", """case '%c':""" % key.lower())
if key.upper() != key.lower():
print(index * " ", """case '%c':""" % key.upper())
sub.print_table(index = index + 1)
if self.value is not None:
assert self.value != 0
print(index * " ", 'case 0: return %d;' % self.value)
print(index * " ", 'default: return 0;')
print(index * " ", "}")
def print_words(self, sofar=None):
if sofar is None:
sofar = []
if self.key is not None:
key = "".join(sofar)
key = key.replace("-", "_")
print(key, "=", self.value, ",")
for key, sub in sorted(self.children.items()):
sub.print_words(sofar + [key])
words = Trie()
for line in open(sys.argv[1]):
words.insert(line.strip(), line.strip())
print("#include <stddef.h>")
print("static unsigned int PerfectHashMax = ", Trie.count, ";")
print("static unsigned int PerfectHash(const char *string, size_t length)")
print("{")
words.print_table()
print("}")
print("enum class PerfectKey {")
words.print_words()
print("};")
@julian-klode
Copy link
Author

julian-klode commented Sep 24, 2016

Performance evaluation of an APT Packages file recognizer, times in nanoseconds, averaged over 100,000,000 runs:

amd64:

word perfect gperf djbhash alphaha
Package 9 11 10 9
PACKAGE 9 11 10 9
NotExisting 2 5 16 9
Installed-Size 14 18 20 9

arm64:

word perfect gperf djbhash alphaha
Package 14 20 15 13
PACKAGE 12 20 14 13
NotExisting 4 8 17 12
Installed-Size 22 29 21 12

armhf:

word perfect gperf djbhash alphaha
Package 14 23 14 11
PACKAGE 12 23 14 11
NotExisting 4 7 18 13
Installed-Size 22 35 21 11

@julian-klode
Copy link
Author

julian-klode commented Sep 24, 2016

Complete set result, respecting the case (perfect & djb hash only)

word perfect gperf djbhash alphaha
Package 7 7 10 9
Version 6 7 10 9
Architecture 9 7 17 9
Multi-Arch 8 7 14 9
Section 6 7 10 9
Source 6 7 9 8
Size 5 7 6 6
Installed-Size 11 7 19 9
Priority 7 7 11 10
Depends 6 7 10 9
Pre-Depends 9 7 16 9
Conflicts 7 7 13 9
Breaks 5 7 9 8
Recommends 8 7 14 9
Suggests 7 7 12 10
Replaces 7 7 12 10
Enhances 6 7 12 10
Optional 6 7 12 10
Description 9 7 16 9
Description-md5 11 7 21 9
Essential 7 7 13 9
Important 8 7 13 10
Status 6 8 9 8
Provides 7 8 12 11
SHA256 6 7 9 8
SHA512 7 7 9 8
MD5sum 6 7 9 9
Maintainer 8 7 14 9
Filename 6 7 12 10
Tag 4 6 5 5
Homepage 6 7 12 10
SHA1 5 7 6 6
Pin-Priority 9 7 17 9
Pin 5 6 5 5
Codename 7 7 12 10

@julian-klode
Copy link
Author

Complete set results ignoring case

word perfect gperf djbhash alphaha
Package 10 11 10 9
Version 8 11 10 9
Architecture 13 16 17 9
Multi-Arch 10 14 14 9
Section 8 11 10 9
Source 7 11 9 8
Size 5 9 6 6
Installed-Size 14 18 20 9
Priority 9 12 12 10
Depends 8 11 10 9
Pre-Depends 12 15 16 9
Conflicts 10 13 13 9
Breaks 7 11 9 8
Recommends 11 14 14 9
Suggests 9 12 11 10
Replaces 9 12 12 10
Enhances 9 12 11 10
Optional 9 12 11 10
Description 12 15 16 9
Description-md5 17 19 21 9
Essential 10 13 13 9
Important 10 13 13 9
Status 7 11 9 8
Provides 9 12 12 10
SHA256 7 11 9 8
SHA512 7 11 9 8
MD5sum 8 11 9 8
Maintainer 11 14 14 9
Filename 9 12 11 10
Tag 4 7 5 6
Homepage 9 12 11 10
SHA1 6 9 6 6
Pin-Priority 13 16 17 9
Pin 6 7 5 5
Codename 9 12 12 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment