Skip to content

Instantly share code, notes, and snippets.

@jmendeth jmendeth/parser.py
Last active Apr 10, 2019

Embed
What would you like to do?
Parsing twitter IDs
"""
Simple parser for Twitter-generated IDs.
ID model taken from Snowflake, which is no longer used:
https://github.com/twitter-archive/snowflake/blob/b3f6a3c6ca8e1b6847baa6ff42bf72201e2c2231/src/main/scala/com/twitter/service/snowflake/IdWorker.scala
Twitter IDs are 64-bit, and have the following structure:
- Bits 63 - 22: timestamp (in ms)
- Bits 21 - 17: datacenter ID
- Bits 16 - 12: worker ID
- Bits 11 - 0: sequence number (reset to 0 for every timestamp)
Usual datacenter values:
- 11: seems to be US / europe?
- 10: seems to be asia?
"""
from urllib.parse import urlparse
import re
# Value that corresponds to timestamp=0 (Unix time, but milliseconds)
epoch = 1288834974657
def parse(id):
extract = lambda x, n: (x & ((1<<n) - 1), x >> n)
r = {}
r["sequence"], id = extract(id, 12)
r["worker_id"], id = extract(id, 5)
r["datacenter_id"], id = extract(id, 5)
r["timestamp"] = id + epoch
return r
def format(timestamp, datacenter_id, worker_id, sequence=0):
timestamp = timestamp - epoch
assert (1 << 42) > timestamp >= 0
assert (1 << 5) > datacenter_id >= 0
assert (1 << 5) > worker_id >= 0
assert (1 << 12) > sequence >= 0
return
def from_url(x):
x = urlparse(x)
m = re.match("/([^/]+)/status/(\\d+)(/|$)", x.path)
if not (m and x.netloc.lower() in ["twitter.com", "www.twitter.com"]):
raise Exception("Not a tweet URL")
return int(m.group(2))
get_datetime = lambda timestamp: datetime.fromtimestamp(timestamp / 1000)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.