I've been using cassandra a lot lately. It's very common to denote ID fields in cassanda as a time-based uuid field called timeuuid
docs. In python, a timeuuuid
is mearly uuid.uuid1
from the built-in uuid
package.
Using the cassandra-driver, you might end up with a model like:
import uuid
from cassandra.cqlengine import columns
from cassandra.cqlengine.models import Model
class VideoComment(Model):
video_id = columns.Text(primary_key=True)
comment_id = columns.TimeUUID(primary_key=True, default=uuid.uuid1) # second primary key component is a clustering key
comment = columns.Text()
Convering the field comment_id
into something a bit more useful, was a bit tricky. That's why I made this guide & code. Be sure to grab the uuid_cfe.py
file to use the following examples:
import uuid_cfe
import uuid
example_time = uuid1().time
example_datetime = uuid_cfe.uuid1_time_to_datetime(example_time)
import uuid
import pandas as pd
import uuid_cfe
example_data = [{"x": y, "uuid1": uuid.uuid1()} for y in range(20)]
df = pd.DataFrame(example_data)
df['datetime'] = df['uuid1'].apply(lambda x: uuid_cfe.uuid1_time_to_datetime(x.time))
To me, this is the most practical use case. Query a cassandra database, load it into pandas (for all kinds of analysis), extract the amazing built-in timestamps to the timeuuid
/uuid
fields!
import pandas as pd
import uuid_cfe
# VideoComment defined above
# extract data from cassandra db
items = list(VideoComment.objects().values_list("video_id", "comment_id"))
# initialize a dataframe
df = pd.DataFrame(items, columns=['video_id', 'comment_id'])
# create a pandas column from the cassandra timeuuid column
df['time_int'] = df['comment_id'].apply(lambda x: x.time)
# convert the time from timeuuid/uuid1 into a datetime object
df["date"] = df['time_int'].apply(lambda x: uuid_cfe.uuid1_time_to_datetime(x))
# sort by our new datetime object
df = df.sort_values('date', inplace=True, ascending=False)