Skip to content

Instantly share code, notes, and snippets.

@johnnyaug
Created October 22, 2020 12:14
Show Gist options
  • Save johnnyaug/5076d5398f5c7f2f33044eeddc58f6e3 to your computer and use it in GitHub Desktop.
Save johnnyaug/5076d5398f5c7f2f33044eeddc58f6e3 to your computer and use it in GitHub Desktop.
# lakeFS with MinIO
lakeFS gives Git-like capabilities over your MinIO storage, allowing you to coordinate with colleagues when working on your data.
In the following example, we will use lakeFS to create a branch on your storage, commit changes to it, and then merge it to the master branch.
## Prerequisites
* Install MinIO Server from [here](https://docs.min.io/docs/minio-quickstart-guide).
* Install `mc` from [here](https://docs.min.io/docs/minio-client-quickstart-guide).
* Install docker-compose from [here](https://docs.docker.com/compose/install/).
## Installation
For this example we will use a Postgres instance within a docker container. A production-suitable installation will require a persistent Postgres installation.
We will install lakeFS locally on your development machine. For more installation options, see lakeFS [docs](https://docs.lakefs.io/deploying/install.html).
Create a docker-compose enviornment file for lakeFS, replacing `<minio_access_key_id>`, `<minio_secret_access_key>` and `<minio_endpoint>` with their values in your MinIO installation.
```bash
LAKEFS_CONFIG_FILE=./.lakefs-env
echo "LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=<minio_access_key_id>" > $LAKEFS_CONFIG_FILE
echo "LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=<minio_secret_access_key>" >> $LAKEFS_CONFIG_FILE
echo "LAKEFS_BLOCKSTORE_S3_ENDPOINT=<minio_endpoint>" >> $LAKEFS_CONFIG_FILE
```
Then start lakeFS:
```bash
curl https://compose.lakefs.io | docker-compose --env-file $LAKEFS_CONFIG_FILE -f - up
```
## Configuration
Browse to lakeFS to create an admin user: `127.0.0.1:8000/setup`
Take note of the generated access key and secret.
We will use the `lakectl` binary to perform lakeFS operations. Find the distribution suitable to your operating system [here](https://github.com/treeverse/lakeFS/releases), and extract the `lakectl` binary from the tar.gz archive. Put it somewhere in your $PATH and run `lakectl --version` to verify.
Then run the following command to configure lakectl (use the credentials given to you in the setup before):
```bash
lakectl config
# output:
# Config file /home/janedoe/.lakectl.yaml will be used
# Access key ID: <LAKEFS_ACCESS_KEY_ID>
# Secret access key: <LAKEFS_SECRET_KEY>
# Server endpoint URL: http://lakefs.example.com:8000/api/v1
```
Verify that `lakectl` can access lakeFS with the command:
```bash
lakectl repo list
```
If no error is displayed, you are good to go. Now let's set a MinIO alias for lakeFS:
mc alias set lakefs http://s3.local.lakefs.io <LAKEFS_ACCESS_KEY_ID> <LAKEFS_SECRET_KEY>
## Example
Create a bucket in MinIO. Note that this bucket is created directly in your installation of MinIO. Later we will use lakeFS to enable versioning on this bucket.
```bash
mc mb myminio/example-bucket
```
Create a repoistory in lakeFS:
```bash
lakectl repo create lakefs://example-repo s3://example-bucket
```
Create two example files:
echo "my first file" > myfile.txt
echo "my second file" > myfile2.txt
Copy the file to your master branch, and commit:
mc cp ./myfile.txt lakefs/example-repo/master/
lakectl commit lakefs://example-repo@master -m "my first commit"
Now let's create a branch named `branch1`, and copy a file to it:
lakectl branch create lakefs://example-repo@branch1 --source lakefs://example-repo@master
mc cp ./myfile2.txt lakefs/example-repo/branch1/
List master and the branch and see that the new file is only visibile in the branch, while the older file is visible in both the branch and the master.
```bash
mc ls lakefs/example-repo/master
# only myfile.txt should be listed
```
```bash
mc ls lakefs/example-repo/branch1
# both files should be listed
```
Now let's commit the branch, and merge it back to master:
```bash
lakectl commit lakefs://example-repo@branch1 -m "my second commit"
```
```bash
lakectl merge lakefs://example-repo@branch1 lakefs://example-repo@master
```
Now both files are accessible through master:
```bash
mc ls lakefs/example-repo/master
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment