This post was originally featured as a 3-part series at collinprather.github.io/blog/.
In this tutorial, we will be assuming the following:
- You have a working Streamlit app ready to deploy
- If you don't, no worries! The streamlit docs have some great tutorials,
but if you'd rather jump right in, you can go ahead and
git clone
my small example here.
- If you don't, no worries! The streamlit docs have some great tutorials,
but if you'd rather jump right in, you can go ahead and
- You have Docker installed
- You have a working knowledge of the command line
Streamlit is the framework featured in this post as it is designed for data scientists and machine learning engineers to quickly elevate their analysis from their laptops to deployment. Building useful, aesthetically-pleasing web applications is a diffult thing to do and Streamlit has taken great strides in enabling analysts with little web development experience to "create beautiful data apps in hours, not weeks."
So let's say that you've got your streamlit web app prepared in a directory that looks like this:
https://gist.github.com/a7aa44573b72163246160c932ad70716
In order to containerize this application with Docker, the first step will be to add a Dockerfile
to the root of your directory. Your Dockerfile
acts as a set of instructions (more specifically,
a set of commands that could equivalently be called from the command line) from which Docker will
build an image for your app. The image is built by running each line in the Dockerfile
sequentially. Using this image, Docker will then create a container. If this is all new,
I'd recommend taking a look at this Docker overview!
The Dockerfile
for my small example looks like this:
https://gist.github.com/9db5a3aaf8d61c543bbfee7e56eaef97
It is worth mentioning that building off the ubuntu
base image may be a little bit overkill
for the scale of this small web app, however, I found it necessary to get a nice rendering of
an svg file generated by the dtreeviz
package. This is also a great example of a simpler Dockerfile
on this blog. There is a lot to unpack here, so I'll do so line by line.
At the top, we build off the base ubuntu
image with the following line:
https://gist.github.com/8bede1884720d9d4d0dae7f89ac63af6
This means that Docker pulls the ubuntu:18.04
image from DockerHub to begin with.
Next, we update and install a few things we'll need for our web app.
https://gist.github.com/35a7fa5b0847cdf8fd4c647f6a6684dc
After that, we set up our app within the image. Since streamlit's default port is 8501
, we open up that port.
https://gist.github.com/28840bbde4a077ddca3f32bbe555059c
From there, we (optionally) define a working directory within the image and copy over all of our files, then install the necessary python libraries (as defined in our requirements.txt
)
https://gist.github.com/1ff52378b3ea71d08c64015e3f94ca64
Note: It is typically not recommended to copy all files to the image the way we've done above (particularly if you have large files). However, since this is a small example, it won't cause any issues for us.
Finally, we must include a few commands to ensure that streamlist runs as expected. We define a command to launch our web app whenever our docker container gets launched,
https://gist.github.com/61f50c911c03e833c664def147859c74
and we finish by including a few commands to configure streamlit correctly.
https://gist.github.com/66ac5ad2b735960cb1a4dacabb357b51
Now that we have our web app and our Dockerfile
all set up, we're ready to build the image. We can do so with a single command.
Note: you must run this command from the same directory as your Dockerfile
https://gist.github.com/a2e9370421f77a0142604918011b15e2
where -t
tags our image as streamlit:app
and .
references the directory with the Dockerfile. When you run this from the command line, you will see Docker moving through each step defined in the Dockerfile
and installing many packages to the image. Once it is finished (it may take a few minutes the first time), you should see a verification like Successfully tagged streamlit:app
, letting you know that the Docker image was successfully created. You can further verify that the image was created correctly by running docker image ls
, where you should see something like
https://gist.github.com/8aea419b41e385d5d8e856d4481cd37a
At this point, our image has been successfully built and we are ready to run it by way of container! (If the differences between an image and container are confusing, this short post provides some helpful distinctions). One command will do the trick,
https://gist.github.com/d8613bde0ff27a447909f4abd477b552
where -p
allows you to publish a container's port to the host's port and -d
allows you to run it in the background. You can then verify that is is running with a command like this,
https://gist.github.com/d8466d32c65810d33b125684001c1847
Better yet, pop open a web browser and you can view your web app, running in a docker container, at http://localhost:8501/
. If you're using my example, it should look something like this!
Deploying your web app to the cloud with AWS
Now, we will walk through how to deploy your web app to the cloud and make it publicly available! These instructions are tailored to my small example, but should work for any streamlit app you've built.
We'll start by heading over to aws.amazon.com/console. If you do not yet have an account, create one! After you're logged in, locate the Services
tab in the upper left-hand corner, then select EC2
.
Next, you'll want to use the tab on the left hand-side of the console to select Instances -> Launch Instance
.
This will lead you to a screen prompting you to "Choose an Amazon Machine Image". There are many options to choose from here, but our life will be made simplest by choosing the Deep Learning AMI (Ubuntu)
AMI. Using this image does introduce a bit of extra overhead, however, it gurantees us that git and Docker will be pre-installed, so it will be our choice.
After this, we will choose the type of instance to use. To ensure that we'll have enough space to build and run our Docker image, it's a safe (and cheap) bet to pick a t2.medium
instance.
Note: your AWS account will be charged when you launch this instance. The good news is that you'll only be charged about $0.0464 per hour (as of 3/11/20). Don't forget to terminate your instance when finished!
From here, you can skip all the way to step 6 in launching the instance, which is where you'll "Configure Security Group". By default, all ports on our EC2
instance, other than 22
, are closed to the public. In order to make our streamlit app publicly available, we need to open up port 8501
. We can do so by creating a custom tcp rule, as pictured below.
With that set, you can click launch
.
Lastly, we will need to ssh
into the instance to get the code to run our app in the cloud. This requires a key pair. You should be prompted to choose an existing key pair or create one. If you do not have an existing one, choose "Create a new key pair", then download it. Now you're ready to click Launch Instances
.
At this point, your EC2
instance is being built and configured. You can follow its progress back in the AWS console. Once you see a green "running" icon next to your instance, you are able to toggle it, then click the Connect
button near the top of the console. Follow AWS's instructions (shown below) to ssh
into the instance from your local terminal.
After ssh
ing into the instance, there are a few options to get our code into the cloud. This tutorial will assume your code is in a public github repository, however, if necessary, you can scp
your code directly from your local computer to your instance. For our purposes, we'll use git clone
https://gist.github.com/cacb6131d85db52b98338f336763911c
Now that our code is on the instance, we can use the 2 commands featured in part 1 to build, then run the image (output removed for brevity).
https://gist.github.com/4b2ef87b7bb3b1ff206e1c98b20cd1e3
Now, the web app will be served at http://<EC2 public IP address>:8501
! The public IP address can be found under "IPv4 Public IP" in the AWS console. Once you've located it, pull open a web browser and verify that your app is running as expected!
When you're done, don't forget to terminate your instance!
Thus far, we've covered how to build a Docker image for a Streamlit web app and how to move your code into the cloud. In this section, we will walk through how to connect other containerized services to your app. Specifically, we'll connect to a Postgres database, but this process should hold for any other service you'd like to employ.
We've had a running example of a bare-bones web app that could be used to deploy a machine learning model for use by non-technical employees. In some circumstances, we may want the users of this web app to have access not only to the predictions of this model, but also to certain subsets of the underlying data itself. In the example above, it would be nice to grant the users the ability to query specific rows from the dataset which fall into a given leaf of the decision tree. In other words, to give them snapshots of the data, like this (the example uses the famous Boston housing dataset,
as provided by scikit-learn
)
CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | PRICE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.09 | 1 | 296 | 15.3 | 396.9 | 4.98 | 24 |
1 | 0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.9 | 9.14 | 21.6 |
2 | 0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 | 34.7 |
3 | 0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 2.94 | 33.4 |
4 | 0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 396.9 | 5.33 | 36.2 |
The easiest way to add this feature to our web app would just be to save the dataset as part of our source code and ensure it gets included in the Docker image. This approach, however, can quickly become infeasible as our dataset gets large, or we want to make any updates to it. A more robust solution would be to connect our web app directly to a database, enabling us to decouple the app and the data at will, and freely make revisions to either in tandem.
Docker has a fantastic tool called "docker-compose", which allows you to easily chain together containers, and takes care of many details under the hood so that things just work. This is the perfect tool for our use-case. Below, we'll walk through how to use it!
All the code used to add the database to our app can be found on the docker-compose+postgres
branch of the repository. The beauty of Docker is that we do not have to make any structural changes to our app in order to interact with the database, only add a bit of functionality.
The first step in adding the database is creating a docker-compose.yml
file in the root of our repository. Mine looks like this,
https://gist.github.com/853001221856d96d7787a0dd599bfcf5
Let's break it down piece by piece.
The first line,
https://gist.github.com/54270c0fddfe919f6480edda14a8fdfa
indicates which version of the Compose file format that we are using.
The majority of our legwork falls under services
, which is where we'll define how we want to connect our database and app. I had some help from this blog in constructing the postgres portions correctly.
https://gist.github.com/d6cf80eb20684435fa64201f31be952e
Here, we let Docker know that we want to use the postgres:12
image from DockerHub and refer to it as postgres
. Since 5432
is the default port for postgres, we make sure to map that to the container's outgoing 5432
port, so that it will be accessible by the app. Next, we mount our postgres database (db_data
) to the location within the container that postgres stores all of its data, /var/lib/postgresql/data
(this blog explains why this is the preferred method to gurantee our data is persisted). Lastly, we point the database towards a .env
file which contains the username, password, and default name of our database. This file enables us to programmatically query the database without leaving our password in the source code!
Note: the .env
file should never be pushed to Github!
All Docker containers are designed to be ephemeral, easily replaceable blocks that we can place together like legos. On the other hand, we want our data to persist even as our containers change. As Ranvir Singh said,
One of the most challenging tasks ... is separating data from software.
As a solution to this problem, we define a single volume that Docker will mount to the postgres container each time it is re-built or re-run.
https://gist.github.com/ec3e15859503061cad263698d507533e
With our database aligned, we can add our existing streamlit app to the docker-compose.yml
.
https://gist.github.com/88984ffd6e00cdbdadf239ad312ba792
All the action commences with context: .
, which indicates to Docker that the building of our streamlit
image should be governed by a Dockerfile
residing in the same directory as the docker-compose.yml
. Other than that, we open up streamlit's default 8501
port, just as we did in part 1.
Most commonly, you will already have a Postgres instance storing your data. Since our app uses data retrieved via api, there is one extra step we must take to get this data into the database. This can be handled by making use of a script that retrieves the data, then loads it into the database each time the app's image is built. To do so, we must add the following line to our Dockerfile
. (see full Dockerfile
here)
https://gist.github.com/3b3a69361fbec33ab1363d3eb76576a3
An example can be found at ./scripts/load_docker_db.py
in the example repository.
Now, in the root of our repository, we have both a Dockerfile
and a docker-compose.yml
. With one command, we can build the image for our app, pull the Postgres image, connect them on one network, and run their containers together!
https://gist.github.com/bbaae384c96ad49daca31b49c4d62609
Where -d
runs it all in the background.
With all of this happening in the background, it is helpful to take a few diagnostic steps to verify that things are working as expected.
A quick way to validate that Postgres is working is to peek at the logs.
https://gist.github.com/91720184adac029db80810c74e962000
If you see a log stating that the database is ready to accept connections, you're good to go! (control-C
to exit the logs)
For further validation, you can even enter a psql
shell within the Postgres container, then make a small query to test things out.
https://gist.github.com/9c96f753ddeb7f33b87c560ff835672f
With our containers running, we can view the app in a web browser at http://localhost:8501/
. If you need to share the app with others, you can use the steps covered in part 2 to deploy this app to the cloud.
When you're finished, you can use this command to stop and remove the containers running your app and database.