Skip to content

Instantly share code, notes, and snippets.

@derlin
Last active April 27, 2024 18:29
Show Gist options
  • Save derlin/0d4c98f7787140805793d6268dae8440 to your computer and use it in GitHub Desktop.
Save derlin/0d4c98f7787140805793d6268dae8440 to your computer and use it in GitHub Desktop.
Dockerfile and entrypoint example in order to easily initialize a Cassandra container using *.sh/*.cql scripts in `/docker-entrypoint-initdb.d`

Initializing a Cassandra Docker container with keyspace and data

This gist shows you how to easily create a cassandra image with initial keyspace and values populated.

It is very generic: the entrypoint.sh is able to execute any cql file located in /docker-entrypoint-initdb.d/, a bit like what you do to initialize a MySQL container.

You can add any *.sh or *.cql scripts inside /docker-entrypoint-initdb.d, but note that:

  • *.sh files will be executed BEFORE launching cassandra
  • *.cql files will be executed (with cqlsh -f) AFTER cassandra started

Files are executed in name order (ls * | sort)

How to use

  1. download the Dockerfile and entrypoint.sh
  2. edit the Dockerfile in order to copy your init scripts inside /docker-entrypoint-initdb.d/
  3. build the image: docker build -t my-cassandra-image .
  4. run the image: docker run --rm -p 9042:9042 --name cassandra-container -d my-cassandra-image

Note that the scripts in /docker-entrypoint.sh will only be called on startup. If you decide to persist the data using a volume, this will work all right: the scripts won't be executed when you boot your container a second time. By using a volumne, I mean, e.g.:

docker run --rm -d \
    -p 9042:9042 \
    -v $PWD/data:/var/lib/cassandra \
    --name cassandra-container \
    my-cassandra-image
# NOTE: will also work with other cassandra version tags
FROM cassandra:3.11
# Fix UTF-8 accents in init scripts
ENV LANG C.UTF-8
# Here, you can add any *.sh or *.cql scripts inside /docker-entrypoint-initdb.d
# *.sh files will be executed BEFORE launching cassandra
# *.cql files will be executed with cqlsh -f AFTER cassandra started
# Files are executed in name order (ls * | sort)
COPY *.cql /docker-entrypoint-initdb.d/
# this is the script that will patch the already existing entrypoint from cassandra image
COPY entrypoint.sh /
# Override ENTRYPOINT, keep CMD
ENTRYPOINT ["/entrypoint.sh"]
CMD ["cassandra", "-f"]
#!/usr/bin/env bash
##
## This script will generate a patched docker-entrypoint.sh that:
## - executes any *.sh script found in /docker-entrypoint-initdb.d
## - boots cassandra up
## - executes any *.cql script found in docker-entrypoint-initdb.d
##
## It is compatible with any cassandra:* image
##
## Create script that executes files found in docker-entrypoint-initdb.d/
cat <<'EOF' >> /run-init-scripts.sh
#!/usr/bin/env bash
LOCK=/var/lib/cassandra/_init.done
INIT_DIR=docker-entrypoint-initdb.d
if [ -f "$LOCK" ]; then
echo "@@ Initialization already performed."
exit 0
fi
cd $INIT_DIR
echo "@@ Executing bash scripts found in $INIT_DIR"
# execute scripts found in INIT_DIR
for f in $(find . -type f -name "*.sh" -executable -print | sort); do
echo "$0: sourcing $f"
. "$f"
echo "$0: $f executed."
done
# wait for cassandra to be ready and execute cql in background
(
while ! cqlsh -e 'describe cluster' > /dev/null 2>&1; do sleep 6; done
echo "$0: Cassandra cluster ready: executing cql scripts found in $INIT_DIR"
for f in $(find . -type f -name "*.cql" -print | sort); do
echo "$0: running $f"
cqlsh -f "$f"
echo "$0: $f executed"
done
# mark things as initialized (in case /var/lib/cassandra was mapped to a local folder)
touch $LOCK
) &
EOF
## Patch existing entrypoint to call our script in the background
# This has been inspired by https://www.thetopsites.net/article/51594713.shtml
EP=/patched-entrypoint.sh
sed '$ d' /docker-entrypoint.sh > $EP
cat <<'EOF' >> $EP
/run-init-scripts.sh &
exec "$@"
EOF
# Make both scripts executable
chmod +x /run-init-scripts.sh
chmod +x $EP
# Call the new entrypoint
$EP "$@"
-- Here, you can execute any CQL commands, e.g.
CREATE KEYSPACE some_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
CREATE TABLE some_keyspace.some_table (
id int,
month text,
timestamp timestamp,
value text,
PRIMARY KEY ((id, month), timestamp)
) WITH CLUSTERING ORDER BY (timestamp ASC);
@derlin
Copy link
Author

derlin commented Aug 25, 2020

@danpaldev
Copy link

danpaldev commented Sep 22, 2020

Easy and concise! Thank you very much for this, hard to believe that a simple migration is so complicated in Cassandra.

When trying this for first time I got an error related with "entrypoint.sh", it was a permission issue and I fixed it by adding a chmod command after the copy command. My final Dockerfile now looks like this and works fine:

FROM cassandra:latest

ENV LANG C.UTF-8

COPY *.cql /docker-entrypoint-initdb.d/

COPY entrypoint.sh /

RUN ["chmod", "+x", "/entrypoint.sh"]

ENTRYPOINT ["/entrypoint.sh"]
CMD ["cassandra", "-f"]

Hopefully this will be useful for people encountering the same problem.

@EthanHaid
Copy link

EthanHaid commented Aug 21, 2021

This works, but a few notes:

I had to remove the sed command on line 54 of entrypoint.sh, the file doesn't seem to exist in my Cassandra 4.0 image
Also, it's best to put set -e at the beginning of entrypoint.sh so the container doesn't deploy with a failed init.

In the end, I just used the Bitnami docker image instead, since it does the same thing by default, and it provides a non-root user for prod deployment

@Cosmicoppai
Copy link

Cosmicoppai commented Sep 8, 2021

I'm getting this error

 "standard_init_linux.go:228: exec user process caused: exec format error" while building image using cassandra:4.0.

My Dockerfile and entrypoint.sh are same as above

@Cosmicoppai
Copy link

Cosmicoppai commented Sep 8, 2021

I made some changes mentioned by @EthanHaid.

Now , I'm getting this error.

"Running Cassandra as root user or group is not recommended - please start Cassandra using a different system user.
node-1    | If you really want to force running Cassandra as root, use -R command line option."

@derlin
Copy link
Author

derlin commented Sep 8, 2021

Well it seems Cassandra 4 changed some stuffs. I will have a look. In the meantime, @Cosmicoppai, did you try adding the -R argument to the CMD?

@Cosmicoppai
Copy link

@derlin after using -R, the script is executing, but getting this error now

Fatal configuration error node-1 | org.apache.cassandra.exceptions.ConfigurationException: Unable to bind to address /172.23.0.7:7000. Set listen_address in cassandra.yaml to an interface you can bind to, e.g., your private IP address on EC2

I've mounted the 'cassandra.yaml' in docker-compose file to set authenticator properties on nodes, but the docker is not resolving the IP,

@VadimRight
Copy link

VadimRight commented Apr 24, 2024

Dude, I just want to say thank you for this github page, you are the best!
I wish there were more people like you who just share such helpful code with others

@derlin
Copy link
Author

derlin commented Apr 27, 2024

@VadimRight thank you so much for taking the time to leave this comment. It is really nice to know it is worth continuing, I really appreciate 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment