Skip to content

Instantly share code, notes, and snippets.

@gioper86
Last active December 18, 2023 11:29
Show Gist options
  • Save gioper86/b08b72d77c4e0aefa0137fc3655488dd to your computer and use it in GitHub Desktop.
Save gioper86/b08b72d77c4e0aefa0137fc3655488dd to your computer and use it in GitHub Desktop.
Get a Pandas DataFrame from a Cassandra query
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import pandas as pd
def pandas_factory(colnames, rows):
return pd.DataFrame(rows, columns=colnames)
cluster = Cluster(
contact_points=['127.0.0.1'],
auth_provider = PlainTextAuthProvider(username='cassandra', password='cassandra')
)
session = cluster.connect()
session.set_keyspace('giodevks')
session.row_factory = pandas_factory
session.default_fetch_size = 10000000 #needed for large queries, otherwise driver will do pagination. Default is 50000.
rows = session.execute("""select * from my_table""")
df = rows._current_rows
print df.head()
@omkarpbankar
Copy link

rows = session.execute("""select * from my_table""")
df = rows._current_rows

These two lines were time savers for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment