Skip to content

Instantly share code, notes, and snippets.

@johnnyaug
Last active May 22, 2024 08:16
Show Gist options
  • Save johnnyaug/8f7535ca27967c13761fda36b3cdbcc0 to your computer and use it in GitHub Desktop.
Save johnnyaug/8f7535ca27967c13761fda36b3cdbcc0 to your computer and use it in GitHub Desktop.

lakeFS with MinIO

lakeFS gives Git-like capabilities over your MinIO storage, allowing you to coordinate with colleagues when working on your data.

In the following example, we will use lakeFS to create a branch on your storage, commit changes to it, and then merge it to the master branch.

Prerequisites

  • Install MinIO Server from here.
  • Install mc from here.
  • Install docker-compose from here.

Installation

For this example we will use a Postgres instance within a docker container. A production-suitable installation will require a persistent Postgres installation.

We will install lakeFS locally on your development machine. For more installation options, see lakeFS docs.

Create a docker-compose environment file for lakeFS, replacing <minio_access_key_id>, <minio_secret_access_key> and <minio_endpoint> with their values in your MinIO installation. Run the following commands:

LAKEFS_CONFIG_FILE=./.lakefs-env
echo "LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=<minio_access_key_id>" > $LAKEFS_CONFIG_FILE
echo "LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=<minio_secret_access_key>" >> $LAKEFS_CONFIG_FILE
echo "LAKEFS_BLOCKSTORE_S3_ENDPOINT=<minio_endpoint>" >> $LAKEFS_CONFIG_FILE 

Then start lakeFS:

curl https://compose.lakefs.io | docker-compose --env-file $LAKEFS_CONFIG_FILE -f - up

Configuration

Browse to lakeFS to create an admin user: http://127.0.0.1:8000/setup

Take note of the generated access key and secret.

We will use the lakectl binary to perform lakeFS operations. Find the distribution suitable to your operating system here, and extract the lakectl binary from the tar.gz archive. Put it somewhere in your $PATH and run lakectl --version to verify.

Then run the following command to configure lakectl (use the credentials given to you in the setup before):

lakectl config
# output:
# Config file /home/janedoe/.lakectl.yaml will be used
# Access key ID: <LAKEFS_ACCESS_KEY_ID>
# Secret access key: <LAKEFS_SECRET_KEY>
# Server endpoint URL: http://lakefs.example.com:8000/api/v1

Verify that lakectl can access lakeFS with the command:

lakectl repo list

If no error is displayed, you are good to go. Now let's set a MinIO alias for lakeFS:

mc alias set lakefs http://s3.local.lakefs.io <LAKEFS_ACCESS_KEY_ID> <LAKEFS_SECRET_KEY>

Example

Create a bucket in MinIO. Note that this bucket is created directly in your installation of MinIO. Later we will use lakeFS to enable versioning on this bucket.

mc mb myminio/example-bucket

Create a repoistory in lakeFS:

lakectl repo create lakefs://example-repo s3://example-bucket

Create two example files:

echo "my first file" > myfile.txt
echo "my second file" > myfile2.txt

Copy the file to your master branch, and commit:

mc cp ./myfile.txt lakefs/example-repo/master/
lakectl commit lakefs://example-repo@master -m "my first commit"

Now let's create a branch named branch1, and copy a file to it:

lakectl branch create lakefs://example-repo@branch1 --source lakefs://example-repo@master
mc cp ./myfile2.txt lakefs/example-repo/branch1/

List master and the branch and see that the new file is only visibile in the branch, while the older file is visible in both the branch and the master.

mc ls lakefs/example-repo/master
# only myfile.txt should be listed

mc ls lakefs/example-repo/branch1
# both files should be listed

Now let's commit the branch, and merge it back to master:

lakectl commit lakefs://example-repo@branch1 -m "my second commit"
lakectl merge lakefs://example-repo@branch1 lakefs://example-repo@master

Now both files are accessible through master:

mc ls lakefs/example-repo/master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment