Flogging Moby: 8 Docker Antipatterns to Stop Using NOW
Only YOU can prevent whale abuse!
If you search the internet for the words “Docker” or “containers”, some of the top hits are articles on what Docker is and how to get started using containers. Many of those articles (such as my own), are a good starting point for your container journey, but they also only scratch the surface of what is possible with containers. As you begin to dig deeper into this world, you will begin to notice patterns for the building, running, securing, and managing containers.
And you will also begin to notice that some of those patterns are better than others.
In fact, some of those patterns are the opposite of optimal. We generally refer to them as “antipatterns.” To paraphrase Wikipedia, an antipattern usually has two components:
- It is a commonly used process, structure, or pattern of action that - despite initially appearing to be an appropriate and effective response to a problem - has more bad consequences than good ones.
- Another solution exists that is documented, repeatable, and proven to be effective.
Some of these container antipatterns stem from early patterns that were required due to the limitations of Docker and it’s ancestor container platforms. Others come from a lack of understanding of how the underlying container architecture works. Still others are copies of copies of copies of patterns that are floating around on the internet.
It doesn’t really matter where these container antipatterns came from, the simple fact is that, in the interest of security and maintainability, you should stop using them.
Why is your opinion about Docker relevant?
“Why should I stop this? My stuff still works!”
For now maybe. But when it stops working, those container antipatterns will cost you in time to recovery and other possible consequences.
“Why should I listen to you?”
Now this is the right question to ask, and I will take a moment to answer it.
I work with Docker on a daily basis and have been since I started using it in 2015. For reference, Docker was released to the public in 2013. I’ve iterated on Dockerfiles as the technology changed, including multi-stage image builds. I’ve watched the CLI change to allow the docker exec and now docker attach commands, which are invaluable for troubleshooting. I was there when Docker Swarm was announced at Dockercon 2016 in Seattle. I’ve deployed Docker containers into almost every type of environment you can imagine, including stand-alone cloud instances using the docker cli or docker-compose, Amazon ECS, on-prem DC/OS clusters and cloud based Kubernetes clusters. I’m intimately familiar with many of the challenges that Docker users face on a daily basis. I’ve also given talks and published numerous articles about containers and Docker for the Capital One Tech blog.
My overarching goal is to be an advocate for the usage of Docker and containers, but also to raise awareness of how easy it is to inadvertently introduce risks and vulnerabilities to a container environment.
Your mileage may vary – There are reasons to use some of these antipatterns
I’m about to give you a list of things you should not be doing with containers. Your gut reaction might be “But I have a good reason for doing <insert antipattern here>.” I agree. As my dad used to say, “There is an exception to every rule.”
However, what I’m suggesting is that these are, for most users, bad patterns or practices to use. There are, however, purposeful reasons to do some of these things. These are usually niche use cases, but the flexibility of containers can make these use cases perfectly viable. Just be sure to ask yourself if there is a better way to accomplish what you’re trying to do that is easier to troubleshoot and/or easier to maintain.
Making a list – 8 container antipatterns to avoid
So, without further ado here’s my list of container antipatterns to avoid:
- Not managing your container supply chain
- Installing non-essential executables into a container image
- Cloning an entire git repo into a container image
- Building a Docker container image “on the fly” right before deployment
- Running as root in the container
- Running multiple services in one container
- Embedding secrets in an image
- Not installing package updates when building your images
Container antipattern deep dives
Now that is not an exhaustive list, by any means, but they are issues that I see crop up in container usage on a regular basis. So let’s talk about why these are antipatterns, and possible solutions or alternatives — because this article wouldn’t be much use if I just told you what you are doing wrong without helping you figure out how to do it right.
Container Antipattern #1 – Weak container supply chain management
What is a container supply chain? This is what you see when you do a `docker history` command on a container, the list of previous containers that the current container is based on. It’s vitally important that you manage where you source your base container images. If you or your team is simply pulling pre-made, non-official docker images and running them, you are most definitely not managing your container supply chain.
Why is this bad? – Potentially compromised containers
It is trivially easy to push malicious code into a docker container image. It’s somewhat more difficult to detect the malicious code in the image. So while somebody may have already solved the problem you’re working on, don’t just blindly pull and run their images, or even base new images off theirs.
What to do instead – Contain your containers
Instead of just pulling somebody else’s images, you should instead look at their Dockerfile (or other container image build infrastructure) and mimic their work, tweaking here and there to meet your own needs. You should also ensure that any FROM lines in the Dockerfile are container images that you trust, either official base images from the suppliers or base images that you yourself have generated – preferably from scratch. Is that some extra work? Yes. But think of how much work it will be if you have a security breach because you didn’t do this.
Container Antipattern #2 - Installing non-essential executables into a container image
What is a non-essential executable? Anything that is not required by either the underlying container or the language interpreter that runs your app. For example, you probably don’t need a text editor in a container image for production release. For a Python-based app, there may be several required executables to support the Python scripts, whereas for a Go-based app, you might be able to literally run your app in a container image that was created from the “scratch” special base image.
Why is this bad? – Size matters
Every executable you install into a container image beyond the bare minimum is a potential vulnerability. Non-essential executables also add bytes and bloat to your container image, slowing your container image pull times and causing more bits to be sent over the network.
What to do instead – Less is more
Start from a minimal base image. Make sure that image is an official image from the provider or a base image you generated yourself from scratch. This is especially important, as secondary images may contain dangerous or other undesirable code. The most important thing here is to know what your app requires and what it doesn’t require, and to not install any executables you don’t need. Some of your app’s language dependencies may also require certain executables to be installed, so use care when removing them. But building a highly optimized container image will save you time and money.
Container Antipattern #3 - Cloning an entire Git repo into a container image
This is exactly what it sounds like, and it looks like this in your Dockerfile:
RUN git clone https://github.org/somerepo
Why is this bad? – Lots of extra files
First, you’re relying on a non-local source for the files that go into your Docker image, which means that you potentially haven’t had a chance to look at and vet these files before copying them into the nascent image.
Second, a git clone comes with a bunch of extra files, e.g.
the .git/ directory that gets created under the root of your repository. Extra files means that your container image is that much larger. The
.git/ folder includes history, and could therefore include some sensitive files. Sure, you can r
m -rf ./.git, but it’s easy to forget that step, and even that may not completely remove the files because of the way that Docker handles file system layers.
Third, you’re now dependent on your container engine’s virtual networking to retrieve the remote files. Container networking adds yet another layer of complexity to your build process. This can be especially error prone if there is a corporate proxy between your build and your remote repository.
Finally, it means you have the
git executable installed in your docker image, and unless your application is actually manipulating Git repositories, you do not need the
git executable installed in your image.
What to do instead — Assemble your container puzzle
Instead of doing a
git clone as part of your Dockerfile, do the
git clone to a sub-directory of the docker build context via a shell script, then only add the files you need from the cloned repository to your docker image using the
COPY directive. Additionally, add a .dockerignore file to your own repository so that files you do not want to copy to a docker image never end up there.
The exception – Docker’s multi-stage build
This is one of those patterns that you don’t want to use for a “normal” image build. That said, for a multi-stage build, this could be a potentially acceptable pattern. In a multi-stage build you use a specially-crafted image to build and test your executable, then copy the resulting executable (and any dependencies) from the container created during the first stage of the build into another, production-hardened base image to create the final image for deployment.
Even in this use-case, I still prefer the idea of cloning the repository to a local folder, then copying that folder to the build-stage container over running
git clone in the Dockerfile.
Container Antipattern #4 – Building a Docker container image “on the fly” right before deployment
This is somewhat similar to the above antipattern, but goes beyond just doing a git clone directly into an image. This involves cloning, building, and then running the newly created image without ever pushing the image to an intermediary Docker registry.
Why is this bad? – No security screening
This is an antipattern for several reasons. First off, pushing the image to a registry gives you a “backup” of the image. This confers several benefits, the most important of which is that you can easily do a “quick rollback” should your deployment fail. You simply pull that last functioning image and run that, then go fix the current deployment.
Additionally, many current container registries also offer the benefit of scanning your images for potential vulnerabilities. The value of this cannot be overstated – scanning a container image for vulnerabilities helps keep your data and your users safe.
Another reason to avoid this is because the newly created docker image has not been tested at all. You should always test your images before deploying them, especially to a production environment.
What to do instead – Build and push to a registry
Preferably, build your images in a dedicated build environment (i.e. Jenkins/Bamboo, etc), uniquely version them, and push them to a container registry. If available, let the registry scan your images for vulnerabilities. Test that they run properly. When deploying, use the deployment automation to pull the docker image and run it.
Container Antipattern #5 - Running as root in the container
This one gets many new container users, because most container engines build images with root as the default user— mainly because they have to be root in order to create the image in the first place.
Why is this bad? – Power corrupts
Running as root, or more specifically, as a superuser with UID 0 in a Linux-based container opens your system up to the potential of a takeover. One potential situation could be a container breach that allows a bad actor at least some access inside your network. Another situation is if a bad actor manages to accomplish a container breakout, now they potentially have root access to the container host system.
What to do instead – Drop administrator privileges
To change the active user in a container, use the USER directive in your Dockerfile. This will tell Docker to execute all subsequent commands in the Dockerfile as that user. Before you do this, be sure to actually create the user in your image, or the subsequent commands will fail. Also, be sure to give that user adequate permissions to perform the various commands needed, including running your application.
Container Antipattern #6 – Running multiple services in one container
This antipattern has begun to disappear as people get more used to the idea of containers, but I still see it on occasion. Basically, this is running multiple tiers of your application out of the same container; for instance, running the API and the database of your application out of the same container.
Why is this bad? – Containers should be minimal
Containers are designed to be minimalist instances. That is, the absolute basics of what you need to run a given tier of your application. When you stack the API server on top of your database instance in a given container, you have bypassed the minimalist concept in favor of something more complex.
Additionally, containers are designed to exit when the primary executable in the container exits and relay the exit code up the stack to the shell that launched the container. When you run multiple services out of a single container, now you also have to manage unexpected exceptions or executable errors yourself instead of depending on the container engine to handle it.
What to do instead – One service per container
One container per task. Set up a local virtualized container network (e.g. docker network create) for any containers on the same system that need to communicate with each other so they can easily talk without exiting the host’s local virtual network.
Container Antipattern #7 – Embedding secrets in an image
This antipattern is easy to fall into as there are many nooks and crannies (such as ENV directives in your Dockerfile) in container images where it is easy to store things, and even easier to forget you put it there. Additionally, you could accidentally copy a .env or other file that contains local development secrets into your container image if you don’t take the proper precautions.
Why is this bad? – It’s (not) a secret
Your secrets have been compromised. A third party could potentially access anything those secrets grant access to.
What to do instead – Retrieve at runtime
This one requires adopting some good practices. One of these is to have a .dockerignore that includes any local files where you might be keeping local development secrets. For example, you might use a .env file to store a set of environment variables that get populated with local secrets, but any file that could contain potentially sensitive information should be in your .dockerignore (and in your .gitignore as well).
Another is following good patterns for Dockerfiles. You should never put a secret anywhere in a Dockerfile. If you need a secret for the build or for some kind of test, you should find a secure way to get those into the container without passing them via --build-arg (as those will be embedded in the image, and are accessible via the command docker history). Docker has recently added a new functionality to the docker command line called BuildKit. BuildKit allows you to securely pass secrets to the Docker Build.
To keep your secrets safe at run time, I recommend a method of getting the secrets from a secret store, such as Hashicorp Vault or a cloud-based service that offers secure key management (such as AWS KMS). Docker also has a relatively new built in secrets functionality, but it requires you to create a docker-swarm in order to use it.
Container Antipattern #8 – Failing to update packages when building images
This is an antipattern that falls under the category of being a former best practice. For various reasons container image providers encouraged you to not update/upgrade the packages in the base image. Not anymore. Today, any installed packages should be updated any time you build a new image.
Why is this bad? – Out of date packages
In a perfect world, base container images would always have the latest version of all installed packages. The reality is that even automated container image builds are generally done on some kind of cadence – daily, weekly or perhaps even longer – and updated versions of packages are being published daily. This is especially important for packages with security updates, as you most definitely do not want to push a container image into production that could have a vulnerability.
What to do instead – Update your packages
The good news here is that this is dead simple to fix, just use the underlying linux distribution’s package manager to update and upgrade all packages as part of your Dockerfile.
I recommend doing this as early in the build as possible, possibly as the first commands in your first RUN directive.
If you’d like more information about why this is now considered to be the correct practice, Python Speed has a very good article on the history of this pattern.
“A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.” ― Douglas Adams, Mostly Harmless p. 113
As I said before, this is not an exhaustive list of container antipatterns, and the flexibility of containers means that users will certainly continue to create new ones. If you are using some of the above patterns, I hope I’ve given you a good argument to move away from them to a more sustainable and/or secure pattern, or at least given you food for thought.
Whatever you do, always remember that we are human and we all make mistakes— in fact I have been guilty of a few of the above antipatterns myself. If you don’t take anything else away from this article, please take this to heart: never stop learning. Technology is always changing. Using the tools and becoming familiar with their quirks is the best way to learn, and experience is what helps us not become a victim of some of the pitfalls that technology enables.