MacGyverBass/My Docker Tips.md

## My Docker Tips.md

      
    Raw
  

              My Docker Tips.md
            
          
    My Docker Tips


When using alpine as a base image, use apk --no-cache add for packages... don't ever use apk update or apk upgrade — this just causes the repo database to be populated and add wasted space to an image.  Additionally, if using Debian/Ubuntu, since you have to do a apt-get update first before installing, at the end of installation of packages, do a rm -rf /var/lib/apt/lists/* to clean up the apt lists.
If you're going to add something for one step and never use it again, use && to merge the steps.  For instance: apk --no-cache add git && git clone ${giturl} && apk --no-cache del git
Combine steps, but don't just throw EVERYTHING on one RUN command.  It's ugly.  If you do combine steps, make use of \ at the end of a line and && at the beginning of the next line (preferably indented to line up to the commads) to keep things neat.
If you have a weird sed -i "s/nothing.com/example.com/g" file.html line or similar that appears to do something that isn't a standard installation command, add a damn comment.  You can comment everything, but at the minimum, comment things that aren't standard install steps.
Avoid hard-coding things... the language, timezone, and even the app's ports should have defaults, but be able to overridden via environmental variables.  While ports can be remapped via the publish command, there is a chance a end-user may want to run the image bridged to a network and want custom ports applied.  Just try to be flexible when possible.
Take advantage of ENTRYPOINT & CMD as much as possible.  It's annoying to review a GitHub page to open a script only to find it has just one line in it to run the main binary.  If your Docker image is designed for one binary, use that as the ENTRYPOINT; if your image has default flags/parameters, use CMD.  For example: ENTRYPOINT ["/vlmcsdmulti", "vlmcsd"] and CMD ["-D", "-d", "-t", "3", "-e", "-v", "-T1"] — One defines the main entrypoint (thus the name) of the image, but the CMD defines the default arguments to be passed to the entrypoint.  (Note that the entrypoint can still be overridden via --entrypoint when running the image).
Avoid external files as much as conveniently possible.  Just because your binary can read a config file, it doesn't mean you need one... if your binary can accept flags/switches/parameters, use them when launching it.  For example, instead of having a single config file with nodaemon=true use --nodaemon if the binary supports the necessary flags.
Try to use official images as your base image as often as possible.  This makes it easier to keep your image up-to-date instead of waiting on the third-party image to be updated.  Just because it's easy to use someone else's image as a base doesn't always make it wise... at least review the base image or fork it to your own GitHub, as you can either move some of their code to your Dockerfile OR at least be in control of keeping it up-to-date.  It's also MUCH more secure to base your images of an official image, as a third-party base image may introduce potential security weaknesses to your image.
Multi-stage images can be your friend.  If you need to compile some files to be used in your image, do them in a separate stage.  Even something simple like downloading/extracting and/or rewriting those files should be done in another stage to keep your overall final image size as low as possible.  It also makes things more organized as you are keeping major phases in separate stages.
Try to use HEALTHCHECK whenever possible.  It's awesome for an image to show if it's running "healthy" or not.  Not all images require this, especially short-lived images that run & quit, but it is highly recommended for services such as web-servers and such.
If you host your image on DockerHub, connect it to your GitHub... even if you don't use the automated builds feature.  The main reason is that it allows DockerHub to link to the GitHub, which makes it easier for users to find/view the Dockerfile and any other related files.  Basically, wherever you host your image or talk about it, provide a link to the source files... transparency is also good, as users that are worried about the image can review the code that put it all together.
If your downloading a file just to extract it, do it within the same line without saving the file.  For example: wget "http://host.com/file.tgz" -O- | tar zxv -C /output/ — This will download the file, outputting to the pipe and extract to /output/ — Though your actual command will vary, you can use curl instead or leave out the v to make the extraction less verbose, and so forth.
When possible, use -v or --verbose to show the progress of the commands, while this can only be viewed during initial builds or when reviewing the build log, it can help with debugging and doesn't add to the build size.
Use ARG instead of ENV for any environmental variables that are ONLY required during the Docker build process.  For instance, if using a Debian/Ubuntu base, you should put ARG DEBIAN_FRONTEND="noninteractive" at the top of your Dockerfile (but below your FROM) — Note that ARG variables are cleared after each FROM so if you need them for your FROM, place them right before the FROM and if you need them again for something else in the build, you'll need to repeat them below the FROM (but you don't need to provide the value again).
Organize your Dockerfile steps in order of priority of changes... for instance, after the base image, you should install all your necessary OS packages, do any downloads, and if you have files to be added, those should be near the end, lastly would be good (though not required) for the "no-op" commands, such as ENTRYPOINT, LABEL, ENV, CMD, HEALTHCHECK, EXPOSE, and many others.  The reason for this is that these commands only add to the metadata of the image, not to the actual build size.  Also remember that by organizing this, you won't have to rebuild the entire image if only an item near the bottom is changed.
Combine multiple ENV variables into a single command, but remember to use \ (though NOT && as these aren't RUN commands) and have each one on a separate line (with indentation to keep them organized).  While ENV is a no-op command and doesn't add size to the build, it does add a new layer for each of the commands, so by putting them all in one command, it only adds one layer for all of them.
Provide your defaults IN your Dockerfile.  While it may be fine to use them in your code and use switches or a config file to override them, it's easier to override an environmental variable for most things.
Have documentation of the most common way to run the image.  Even if it's just one example, it's fine... but start with that, then try to go into detail for the other ways of running it and any other switches/flags/variables that can be used to modify the behavior of the image.
Avoid using the ":latest" for a base image.  While it may be safe for some builds, a major update may break your build.  And while yes, the image hosted on DockerHub or similar will still have your old version, another developer may copy your Dockerfile and build a new version of the image and get confused as to why it doesn't work, when in truth it's because the base image was several versions prior to the current "latest".  Locking it to a major/minor version is good practice and allows you to easily test a newer version before updating it, plus by just changing the tag, it will force the image to rebuild on DockerHub (when using the automated builds).  However, there may be situations where the ":latest" tag is fine, such as using images that have no versions or if you know for sure your Dockerfile won't break due to the simplicity of the commands in it.  (Such as a Dockerfile that just adds a few files to the image, or a Dockerfile that just pulls specific files from another image that should never move.  e.g.: Pulling /bin/sh from busybox:latest)
Use alpine as your go-to base image, but also check to see if your Dockerfile can also run from something smaller like busybox — While busybox is extremely lightweight and has no package manager, if all you are doing in a image (especially a stage within an image) is downloading/extracting a file, busybox can be a better choice over Alpine, as it is smaller/faster than Alpine.
Try to use scratch as a final base image.  If you are going to only run one binary in your image, either try to get a static binary or compile the binary in another stage as a static binary, then use scratch and copy it into the main image layer.  This will make the builds EXTREMELY small and also make them more secure, as there will be no other binaries inside the image to execute/abuse.
Comments are great... and a Dockerfile is a great place for them.  It's easy to skip comments, but a Dockerfile is usually small and easy to explain each step using a single comment.  While a well organized Dockerfile may not require one for each line, it may help new developers understand each step of your file and help them learn to build in their own files without blindly copy/pasting everything they see.
Don't just blindly copy/paste something you saw online into a Dockerfile.  Find the official documentation steps and use them, if you need additional tweaking, review the commands, test them, and make sure you comment them.
Don't include pre-compiled binaries in your repository to be copied into the image.  Either use your Dockerfile RUN command to download them or compile them (preferably as static binaries) in a separate stage.  Having it included in your repo and copied in is just yet another thing to maintain and update separately, also if you ever plan on having a multi-arch image, it makes your overall process a ton more difficult for yourself... plus other users that want to build your image may be concerned over using a pre-compiled binary that could contain malicious code.  If your only option is to download from a source, do exactly that as part of your Docker build.  The only reason to ever do this is if your binary is old and you cannot find the source anymore at all... but if that is the case, you may want to actually find an alternate source for the binary or work around not using it.
Avoid installing non-base packages if an alternate basic binary is already included in your image.  For instance, Alpine does not include bash by default, but it does include sh which should work in most cases with your script.  Another example is wget — The busybox version of wget is included and is fine for most use cases.  Note that sometimes a binary isn't included, but another binary can perform in a similar way, such as using wget instead of curl — Adding the -O - (or -O-) flag to wget will tell wget to output the contents of the download to stdout, like curl, thus you can skip installing the curl package, and thus keep your final build size just a bit smaller.