Skip to content

Instantly share code, notes, and snippets.

@dbist
Last active January 19, 2024 06:20
Show Gist options
  • Save dbist/ec38494e3a886258679b30d1909705b4 to your computer and use it in GitHub Desktop.
Save dbist/ec38494e3a886258679b30d1909705b4 to your computer and use it in GitHub Desktop.

CockroachDB CDC using Minio as a cloud storage sink

This is the third in the series of tutorials on CockroachDB and Docker Compose.

You can find the first post here and second post here

  • Information on CockroachDB can be found here.
  • Information on Docker Compose can be found here
  • Information on Minio can be found here
  • Information on Changefeed can be found here
  1. Start the docker-compose if you haven't already
docker-compose up -d
  1. Browse to the Minio Web UI to create our first bucket

As covered in the previous post as well as noted in the minio container logs, you can access the Web UI using http://127.0.0.1:9000. Again, from our compose script, we set environment variables for access key, secret key and region name

     - MINIO_ACCESS_KEY=miniominio
     - MINIO_SECRET_KEY=miniominio13

Plug in those keys into the browser to access the UI.

access

Once you're in, you are greeted with Minio file browser.

minio_ui

  1. Create a bucket by clicking the plus sign in the lower-right hand corner.

Hit enter after typing out the name.

create_bucket

  1. Set up a changefeed in CockroachDB

Access the cockroach CLI

docker exec -it crdb-1 ./cockroach sql --insecure

Follow along in this tutorial to get an enterprise changefeed working. Specifically, set your organization name and license, then enable changefeed.

SET CLUSTER SETTING cluster.organization = '<organization name>';

SET CLUSTER SETTING enterprise.license = '<secret>';

SET CLUSTER SETTING kv.rangefeed.enabled = true;

CREATE DATABASE cdc_demo;

SET DATABASE = cdc_demo;

CREATE TABLE office_dogs (
     id INT PRIMARY KEY,
     name STRING);

INSERT INTO office_dogs VALUES
   (1, 'Petee'),
   (2, 'Carl');

UPDATE office_dogs SET name = 'Petee H' WHERE id = 1;
  1. Create Minio specific changefeed
CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://miniobucket/dogs?AWS_ACCESS_KEY_ID=miniominio&AWS_SECRET_ACCESS_KEY=miniominio13&AWS_ENDPOINT=http://minio:9000&AWS_REGION=us-east-1' with updated;
CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://miniobucket/dogs?AWS_ACCESS_KEY_ID=miniominio&AWS_SECRET_ACCESS_KEY=miniominio13&AWS_ENDPOINT=http://minio:9000&AWS_REGION=us-east-1' with updated;
        job_id
+--------------------+
  517728810522771457
(1 row)

Time: 13.6961ms

You can now monitor your changefeed with ID 517728810522771457 in the CRDB UI or with SHOW JOBS SQL command.

jobs_ui

Let's disect this statement a bit. First there's a scheme experimental-s3://, object storage sink functionality is still work in progress and is being actively hardened. Then we mention miniobucket and directory within the bucket dogs. We are using Minio as our S3 object store and will plugin the AWS_KEY_ID and AWS_SECRET_ACCESS_KEY with properties we set for Minio in the compose file. Additionally, we have to set AWS_ENDPOINT so that Minio can serve as our AWS endpoint. We're also adding an optional AWS_REGION but for the sake of this tutorial it can be left out. You can probably see a pattern, prepend any of the site-specific properties with &PROPERTY. Feel free to review changefeed documentation for additional features, like the one I just learned where changefeed can be set up for multiple tables, e.g. CREATE CHANGEFEED FOR TABLE table1, table2, table3 INTO.... It will maintain a single changefeed for multiple tables, which happens to be a best practice over single changefeed per table.

Now that changefeed is actually working, browse to the Minio UI and within the miniobucket you will find a directory called dogs.

changefeed_in_minio

Here you will find a timestamped directory and within it a json file. This file will include all changes to the table. This CDC can also be set up with Kafka sink, again, feel free to get acquinted with changefeed documentation linked above. Possibilities are endless!

Finally, let's browse to the file and peek into it using a file browser. Recall in the compose file, we started Minio with a docker volume mapped to current directory ${PWD}/data:/data. Navigating to the current directory where compose file resides, you will find a /data directory. Browse to the root of the child directory to find the json payload.

cd data/miniobucket/dogs/2020-01-03

This is where all of the CDC changes to the records in the table will be stored in json format.

artem@Artems-MBP 2020-01-03 % ls
202001031638241981334000000000000-8e419d24c147ac20-1-8-00000000-office_dogs-1.ndjson
artem@Artems-MBP 2020-01-03 % head 202001031638241981334000000000000-8e419d24c147ac20-1-8-00000000-office_dogs-1.ndjson
{"after": {"id": 1, "name": "Petee H"}, "key": [1], "updated": "1578069504198133400.0000000000"}
{"after": {"id": 2, "name": "Carl"}, "key": [2], "updated": "1578069504198133400.0000000000"}

From this point on, executing update on the table will create new files.

UPDATE office_dogs SET name = 'Beathoven' WHERE id = 1;
root@:26257/cdc_demo> select * from office_dogs;
  id |   name
+----+-----------+
   1 | Beathoven
   2 | Carl
(2 rows)

Time: 1.4273ms

And on the filesystem:

artem@Artems-MBP 2020-01-03 % ls
202001031638241981334000000000000-8e419d24c147ac20-1-8-00000000-office_dogs-1.ndjson
202001031639259077295000000000001-8e419d24c147ac20-1-8-00000001-office_dogs-1.ndjson
artem@Artems-MBP 2020-01-03 % head *
==> 202001031638241981334000000000000-8e419d24c147ac20-1-8-00000000-office_dogs-1.ndjson <==
{"after": {"id": 1, "name": "Petee H"}, "key": [1], "updated": "1578069504198133400.0000000000"}
{"after": {"id": 2, "name": "Carl"}, "key": [2], "updated": "1578069504198133400.0000000000"}

==> 202001031639259077295000000000001-8e419d24c147ac20-1-8-00000001-office_dogs-1.ndjson <==
{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578069607039550700.0000000000"}

This is a wrap for CDC sink to an object store with Cockroach and Minio. Hope you enjoyed this tutorial. Please share your feedback in the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment