Skip to content

Instantly share code, notes, and snippets.

@gakhov
Created July 30, 2019 10:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gakhov/540ddb7e8a1e8400375f9ad3a0a16ff8 to your computer and use it in GitHub Desktop.
Save gakhov/540ddb7e8a1e8400375f9ad3a0a16ff8 to your computer and use it in GitHub Desktop.
Example: How to use HyperLogLog from pdsa Python library
import json
from psda.cardinality.hyperloglog import HyperLogLog
hll = HyperLogLog(precision=10) # 2^{10} = 1024 counters
with open('visitors.txt') as f:
for line in f:
ip = json.loads(line)['ip']
hll.add(ip)
num_of_unique_visitors = hll.count()
print('Unique visitors', num_of_unique_visitors)
size_in_bytes = hll.size()
print('Size in bytes', size_in_bytes)
@gakhov
Copy link
Author

gakhov commented Jul 30, 2019

pdsa is a python library that can be found at https://github.com/gakhov/pdsa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment