Skip to content

Instantly share code, notes, and snippets.

@JensRantil
Last active March 31, 2021 18:04
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save JensRantil/063b7c56ca4a8dfe1c50 to your computer and use it in GitHub Desktop.
Save JensRantil/063b7c56ca4a8dfe1c50 to your computer and use it in GitHub Desktop.
How to count number of tombstones per partition key in one or multiple sstables.
#!/bin/bash
#
# Counts number of tombstones per partition key in one or multiple sstables.
#
# Usage: ./tombstone-count.sh /var/lib/cassandra/data/mykeyspace/mytable/*-Data.db
#
# Sample output:
# "40e6a9839bf44bdaa624cc53e96733fe" 8
# "8e177ab222c14f868bcb6d2922b18d2b" 8
# "28aaa9db0dad4ae78cabe8bcc25d14a3" 9
# "8367c6c14d8e4ccdbd14e85d4a7d3b1f" 9
# "ecaf2f2409b24fa990a18e79f05b4b30" 12
# "3294ffc4dad44853b675dfdb34911576" 13
# (partition keys without any tombstone(s) are not printed).
# Get `jq` here: http://stedolan.github.io/jq/download/
# ltrim taken from http://stackoverflow.com/a/27158086/260805
# The various stages below:
# 1. Choose which file(s) you'd like to check tombstones for here.
# 2. Convert to JSON.
# 3. Count tombstones per primary key.
# 4. Convert from JSON to CSV.
# 5. Sum duplicates of primary keys.
# 6. Sort by the primary key with the most tombstones.
ls "$@" \
| xargs --verbose -L 1 sstable2json \
| jq '.[] | {key: .key, length: [.columns[] | select(.[3]=="t")] | length }' \
| awk -F: 'function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s } /"key"/ {key=$2;} /"length"/ && $2>0 {print ltrim(key), ltrim($2);}' \
| awk -F, '!($1 in myarr) { myarr[$1]=0 } {myarr[$1] += $2;} END {for(i in myarr) print i, myarr[i];}' \
| sort -n -k 2
@AlexisWilke
Copy link

I can see that I have tombstones in various tables, for example, a CQL command with TRACE ON gives me a line such as:

Read 0 live and 313437 tombstone cells | 13:56:29,225 | 10.0.0.2 |         499269

Yet, your code returns nothing. Looking at the data output by sstable2json, I can see some 3rd parameter set to "d", but none are equal to "t". Could that be a version change?

@saipotturi
Copy link

I've got the same issue as AlexisWilke. It return nothing, but CQLSH shows tombstones.

@saipotturi
Copy link

actually it worked for me. The script is perfect. The issue i was facing was that all my tombstones were sitting in the memtable. Once flushed, i could read the tombstones. @AlexisWilke : you might be facing the same issue.

@sedulam
Copy link

sedulam commented Jun 20, 2018

This looks great, unfortunately, it doesn't work with Cassandra version 3.X, because sstable2json does not exist in this version. I have changed the code to use sstabledump instead, but I'm getting the following error:

tombstone_count ~/.ccm/test/node1/data0/tk/tt-5b2a97e06fb211e8a1cbed77bfd182ed/*Data*
/home/pedro/cassandra/tools/bin/sstabledump /home/pedro/.ccm/test/node1/data0/tk/tt-5b2a97e06fb211e8a1cbed77bfd182ed/mc-30-big-Data.db
jq: error (at <stdin>:54): Cannot iterate over null (null)

@madireddyr
Copy link

@sedulam any luck with tombstonecount on 3.x

@fholzer
Copy link

fholzer commented Feb 25, 2019

Find an updated version for Cassandra 3.0.x at https://gist.github.com/fholzer/d6b7f1ce98906b5730cae67c179e0dd2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment