Docker Does Not Mean Slow
There are lots of guides out there about speeding up your Docker image builds. (Some of them are even written by folks other than ourselves!)
This blog post isn’t comprehensive but instead focuses on a couple of oft-neglected and lesser-known techniques.
First, we’ll cover some common pitfalls that .dockerignore
can help you avoid.
Then we’ll look at cache
mounts that let you re-use files between builds even after layers have been invalidated.
Now’s a good time to grab a coffee: we’re going to dive in head-first! ☕️
Create & Tune Your .dockerignore
Docker supports a special .dockerignore
file, which excludes files from the image build context based on file patterns.
For example, the pattern **/*.tmp
will ignore any files with the .tmp
extension at the root build context directory and any of its subdirectories, recursively.
You might be familiar with .gitignore
, which excludes files from being committed to your repo.
A common misconception is that .dockerignore
is additive with .gitignore
, but Docker does NOT use .gitignore
!
(Additionally, while they look very similar, the file glob pattern syntax differs between them.)
There are two big reasons an un-tuned .dockerignore
file can result in performance woes: unnecessary layer cache invalidation and increased Docker context size.
Unnecessary Layer Cache Invalidation
Unnecessary layer cache invalidation can happen when unused files are added to an image.
For example, if your Dockerfile has COPY . /app
, whenever README.md
changes, the layer will be invalidated even though no source code changed!
This means the next image build will have to re-run that step and all subsequent ones.
There can be more insidious instances of this such as generated files that change with every build or locally compiled build artifacts (e.g. target/
or out/
directories).
One thing to consider is whether you run tests as part of your image build or from a container using the built image.
If not, you might consider excluding test source files (e.g. *_test.go
for Go, src/test
for Java).
Bloated Docker Context Size
At the start of the build, Docker creates a tar archive from the build context (e.g. your project directory) for the daemon to use.
Even if you don’t reference files as part of your build via COPY
, you still pay the cost of archiving and transferring them to the Docker daemon!
This can be particularly slow on Docker for Mac/Windows or if using a remote Docker context.
Perhaps the most common case of this is your repository’s .git/
directory, which can generally be excluded.
(If your builds embed Git metadata such as a commit reference, consider using build arguments to pass them in - most CI systems provide this metadata as environment variables and you can set defaults as placeholders for local builds.)
Large test/sample data files that aren’t ever used by the build or resulting image can be another common culprit. Similarly, some tools create Python virtual environments in the corresponding project directory.
Lastly, if you rely on installing npm/yarn dependencies from scratch in your Dockerfile for reproducibility, exclude node_modules/
.
(As a bonus, this helps avoid platform & architecture mismatch issues that can result from transferring binaries meant for your host platform, like Windows or macOS, into a Linux container image!)
Explore Your Context
If you want to inspect your build context, you can use Docker to export a copy to /tmp/docker-context
:
printf 'FROM scratch\nCOPY . /' | DOCKER_BUILDKIT=1 docker build -f - -o /tmp/docker-context .
You can then browse /tmp/docker-context
via CLI or file manager to ensure that irrelevant files are not present.
For more information on .dockerignore
, read the official docs.
Leverage cache
Mounts to Avoid Re-downloading Artifacts
You might be familiar with Docker volume mounts when running containers.
For example, these are commonly used to share a local directory with commands like docker run -v $(pwd)/awesome:/app/awesome my-image
.
Docker also supports build mounts via the underlying BuildKit engine.
These are very powerful; for example, the ssh
mount type allows sharing SSH keys in the build to access private resources.
Of interest for performance is the cache
mount type.
Files in a cache
mount persist between builds but are NOT included in the resulting images.
We can use this for package manager and compiler caches to speed up subsequent builds without bloating the final image.
For example, imagine we have a Java project:
FROM mvn
COPY pom.xml ./
RUN mvn dependency:resolve
COPY src/java ./src/java
RUN mvn package
We’ve already separated out our dependency download from copying in source files and compilation to optimize the layer cache, which is great!
However, if we change anything in pom.xml
, because the layer cache is invalid, we will have to re-download all dependencies from scratch.
We can prevent this with a cache
mount on /root/.m2
which is where Maven stores its package cache:
FROM mvn
COPY pom.xml .
RUN --mount=type=cache,target=/root/.m2 \
mvn dependency:resolve
COPY src/java ./src/java
RUN --mount=type=cache,target=/root/.m2 \
mvn package
Note that we also need to attach the mount to the actual compilation step!
Now, when pom.xml
changes, even though the layer cache has been invalidated, when dependencies are resolved, the /root/.m2
cache mount can help prevent the need to download everything from scratch.
While the path(s) to cache can vary by language, the process remains the same.
For example, npm maintains a cache at ~/.npm
and Yarn allows customizing the cache location with the YARN_CACHE_FOLDER
environment variable.
Some languages, like Go, make it easy to share the compiler cache as well:
FROM golang
ENV GOMODCACHE=/cache/gomod
ENV GOCACHE=/cache/gobuild
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/cache/gomod \
go mod download
COPY main.go .
RUN --mount=type=cache,target=/cache/gomod \
--mount=type=cache,target=/cache/gobuild,sharing=locked \
go build -o /my-app main.go
Here, we created two caches: one for the package manager and one for the compiler.
We’ve additionally marked the compiler cache as sharing=locked
to prevent multiple concurrent builds from using it simultaneously.
Before caching compiler output, be sure to consider the potential impact on image build reproducibility, which might depend on your particular language/compiler!
You can also use this technique to cache OS package manager packages (e.g. those installed with apt
for Debian/Ubuntu-based images).
What Else?
There are many other tricks to speed up your Docker builds while keeping your images small, but a lot of it comes down to profiling your specific setup and iterating.
While optimizing your Dockerfile
, try building with --progress=plain
and/or --no-cache
to get a more traditional, linear log output to help identify slow steps.
Try changing different types of files (source code, package manager manifest, README, etc.) and observe the effect on a re-build.
Finally, a popular technique, particularly for compiled languages, is to use multi-stage builds. You can use this in some creative ways: stay tuned for the next entry in this series that will focus on that!