Skip to content

Instantly share code, notes, and snippets.

@derlin
Last active August 19, 2024 10:16
Show Gist options
  • Save derlin/0d4c98f7787140805793d6268dae8440 to your computer and use it in GitHub Desktop.
Save derlin/0d4c98f7787140805793d6268dae8440 to your computer and use it in GitHub Desktop.
Dockerfile and entrypoint example in order to easily initialize a Cassandra container using *.sh/*.cql scripts in `/docker-entrypoint-initdb.d`

Initializing a Cassandra Docker container with keyspace and data

This gist shows you how to easily create a cassandra image with initial keyspace and values populated.

It is very generic: the entrypoint.sh is able to execute any cql file located in /docker-entrypoint-initdb.d/, a bit like what you do to initialize a MySQL container.

You can add any *.sh or *.cql scripts inside /docker-entrypoint-initdb.d, but note that:

  • *.sh files will be executed BEFORE launching cassandra
  • *.cql files will be executed (with cqlsh -f) AFTER cassandra started

Files are executed in name order (ls * | sort)

How to use

  1. download the Dockerfile and entrypoint.sh
  2. edit the Dockerfile in order to copy your init scripts inside /docker-entrypoint-initdb.d/
  3. build the image: docker build -t my-cassandra-image .
  4. run the image: docker run --rm -p 9042:9042 --name cassandra-container -d my-cassandra-image

Note that the scripts in /docker-entrypoint.sh will only be called on startup. If you decide to persist the data using a volume, this will work all right: the scripts won't be executed when you boot your container a second time. By using a volumne, I mean, e.g.:

docker run --rm -d \
    -p 9042:9042 \
    -v $PWD/data:/var/lib/cassandra \
    --name cassandra-container \
    my-cassandra-image
# NOTE: will also work with other cassandra version tags
FROM cassandra:3.11
# Fix UTF-8 accents in init scripts
ENV LANG C.UTF-8
# Here, you can add any *.sh or *.cql scripts inside /docker-entrypoint-initdb.d
# *.sh files will be executed BEFORE launching cassandra
# *.cql files will be executed with cqlsh -f AFTER cassandra started
# Files are executed in name order (ls * | sort)
COPY *.cql /docker-entrypoint-initdb.d/
# this is the script that will patch the already existing entrypoint from cassandra image
COPY entrypoint.sh /
# Override ENTRYPOINT, keep CMD
ENTRYPOINT ["/entrypoint.sh"]
CMD ["cassandra", "-f"]
#!/usr/bin/env bash
##
## This script will generate a patched docker-entrypoint.sh that:
## - executes any *.sh script found in /docker-entrypoint-initdb.d
## - boots cassandra up
## - executes any *.cql script found in docker-entrypoint-initdb.d
##
## It is compatible with any cassandra:* image
##
## Create script that executes files found in docker-entrypoint-initdb.d/
cat <<'EOF' >> /run-init-scripts.sh
#!/usr/bin/env bash
LOCK=/var/lib/cassandra/_init.done
INIT_DIR=docker-entrypoint-initdb.d
if [ -f "$LOCK" ]; then
echo "@@ Initialization already performed."
exit 0
fi
cd $INIT_DIR
echo "@@ Executing bash scripts found in $INIT_DIR"
# execute scripts found in INIT_DIR
for f in $(find . -type f -name "*.sh" -executable -print | sort); do
echo "$0: sourcing $f"
. "$f"
echo "$0: $f executed."
done
# wait for cassandra to be ready and execute cql in background
(
while ! cqlsh -e 'describe cluster' > /dev/null 2>&1; do sleep 6; done
echo "$0: Cassandra cluster ready: executing cql scripts found in $INIT_DIR"
for f in $(find . -type f -name "*.cql" -print | sort); do
echo "$0: running $f"
cqlsh -f "$f"
echo "$0: $f executed"
done
# mark things as initialized (in case /var/lib/cassandra was mapped to a local folder)
touch $LOCK
) &
EOF
## Patch existing entrypoint to call our script in the background
# This has been inspired by https://www.thetopsites.net/article/51594713.shtml
EP=/patched-entrypoint.sh
sed '$ d' /docker-entrypoint.sh > $EP
cat <<'EOF' >> $EP
/run-init-scripts.sh &
exec "$@"
EOF
# Make both scripts executable
chmod +x /run-init-scripts.sh
chmod +x $EP
# Call the new entrypoint
$EP "$@"
-- Here, you can execute any CQL commands, e.g.
CREATE KEYSPACE some_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
CREATE TABLE some_keyspace.some_table (
id int,
month text,
timestamp timestamp,
value text,
PRIMARY KEY ((id, month), timestamp)
) WITH CLUSTERING ORDER BY (timestamp ASC);
@VadimRight
Copy link

VadimRight commented Apr 24, 2024

Dude, I just want to say thank you for this github page, you are the best!
I wish there were more people like you who just share such helpful code with others

@derlin
Copy link
Author

derlin commented Apr 27, 2024

@VadimRight thank you so much for taking the time to leave this comment. It is really nice to know it is worth continuing, I really appreciate 😊

@kamauz
Copy link

kamauz commented Aug 19, 2024

This works, but a few notes:

I had to remove the sed command on line 54 of entrypoint.sh, the file doesn't seem to exist in my Cassandra 4.0 image Also, it's best to put set -e at the beginning of entrypoint.sh so the container doesn't deploy with a failed init.

In the end, I just used the Bitnami docker image instead, since it does the same thing by default, and it provides a non-root user for prod deployment

I checked in Cassandra 5.0
"docker-entrypoint.sh" file has been moved and its path is: "/usr/local/bin/docker-entrypoint.sh".
If you comment the line 54 in "entrypoint.sh" file, you cannot access to Cassandra from other docker containers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment