Skip to content

Instantly share code, notes, and snippets.

Last active March 31, 2021 18:04
Show Gist options
  • Save JensRantil/063b7c56ca4a8dfe1c50 to your computer and use it in GitHub Desktop.
Save JensRantil/063b7c56ca4a8dfe1c50 to your computer and use it in GitHub Desktop.
How to count number of tombstones per partition key in one or multiple sstables.
# Counts number of tombstones per partition key in one or multiple sstables.
# Usage: ./ /var/lib/cassandra/data/mykeyspace/mytable/*-Data.db
# Sample output:
# "40e6a9839bf44bdaa624cc53e96733fe" 8
# "8e177ab222c14f868bcb6d2922b18d2b" 8
# "28aaa9db0dad4ae78cabe8bcc25d14a3" 9
# "8367c6c14d8e4ccdbd14e85d4a7d3b1f" 9
# "ecaf2f2409b24fa990a18e79f05b4b30" 12
# "3294ffc4dad44853b675dfdb34911576" 13
# (partition keys without any tombstone(s) are not printed).
# Get `jq` here:
# ltrim taken from
# The various stages below:
# 1. Choose which file(s) you'd like to check tombstones for here.
# 2. Convert to JSON.
# 3. Count tombstones per primary key.
# 4. Convert from JSON to CSV.
# 5. Sum duplicates of primary keys.
# 6. Sort by the primary key with the most tombstones.
ls "$@" \
| xargs --verbose -L 1 sstable2json \
| jq '.[] | {key: .key, length: [.columns[] | select(.[3]=="t")] | length }' \
| awk -F: 'function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s } /"key"/ {key=$2;} /"length"/ && $2>0 {print ltrim(key), ltrim($2);}' \
| awk -F, '!($1 in myarr) { myarr[$1]=0 } {myarr[$1] += $2;} END {for(i in myarr) print i, myarr[i];}' \
| sort -n -k 2
Copy link

sedulam commented Jun 20, 2018

This looks great, unfortunately, it doesn't work with Cassandra version 3.X, because sstable2json does not exist in this version. I have changed the code to use sstabledump instead, but I'm getting the following error:

tombstone_count ~/.ccm/test/node1/data0/tk/tt-5b2a97e06fb211e8a1cbed77bfd182ed/*Data*
/home/pedro/cassandra/tools/bin/sstabledump /home/pedro/.ccm/test/node1/data0/tk/tt-5b2a97e06fb211e8a1cbed77bfd182ed/mc-30-big-Data.db
jq: error (at <stdin>:54): Cannot iterate over null (null)

Copy link

@sedulam any luck with tombstonecount on 3.x

Copy link

fholzer commented Feb 25, 2019

Find an updated version for Cassandra 3.0.x at

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment