Skip to content

Instantly share code, notes, and snippets.

@reddikih
Created August 5, 2016 06:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save reddikih/0f5e34a421d746f53d02e73a20d5203d to your computer and use it in GitHub Desktop.
Save reddikih/0f5e34a421d746f53d02e73a20d5203d to your computer and use it in GitHub Desktop.
Batch insert python script. It is using the python driver provided by Datastax
#!/usr/bin/env python
from cassandra.cluster import Cluster
from cassandra.query import *
from cassandra import *
import hashlib
import sys
KEYSPACE = 'test'
TABLE = 'user'
BATCH_SIZE = 1000
cluster = Cluster()
session = cluster.connect(KEYSPACE)
def getHash(val):
return hashlib.sha256(str(val)).hexdigest()
def batchInsert(start, num):
insert_q = session.prepare("insert into test.user (user_id, fname, lname, number) values (?, ?, ?, ?)")
for i in range(start, num):
batch = BatchStatement()
for j in range(1, BATCH_SIZE + 1):
hashval = getHash(j)
batch.add(insert_q, (i, hashval, hashval, i + j))
try:
session.execute(batch)
except WriteTimeout as e:
print "write timeout occurred."
if __name__ == "__main__":
start = int(sys.argv[1])
num = int(sys.argv[2])
batchInsert(start, num)
@reddikih
Copy link
Author

Setup

See: Installation

install dependencies

sudo yum install gcc python-devel
sudo yum install libev libev-devel

To install cassandra driver you can use pip

pip install cassandra-driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment