Docker Multi-stage Build: Fast, Minimal and Secure Images
Introduced in version v17.05, multi-stage builds feature in Dockerfiles enables you to create smaller container images with better caching and smaller security footprint. Fundamentally, the new syntax allows one to name and reference each stage and copy artifacts between them.
In this example, the second and third stage will build in parallel.
1 2 3 4 5 6 7 8 FROM ubuntu AS base RUN apt-get update && apt-get install git FROM base AS src1 RUN git clone ... FROM base AS src2 RUN git clone ...
Before multistage builds, one needs quite some heavy lifting to reduce the final image size. Tricks to do so aim at reducing the total number of layers and the size of each layer. It was common to see optimization such as chaining all commands using
&& as shown in the following example.
1 2 3 4 5 6 FROM debian:wheezy RUN apt-get update && apt-get install curl ca-certificates RUN curl -L https://dep.tar.gz -o /tmp/dep.tgz \ && tar xzf /tmp/dep.zip \ && mv /tmp/dep/bin/dep /workingdir/bin \ && rm -rf /tmp/dep*
The reason is that each COPY, RUN, ADD is a new layer. Layers are analogous to git commits. They both represent the delta between two snapshots. If you add and remove the same file, the final result is identical but the aggregated deltas are twice the size. Hence,
&& not only reduces multiple layers into one but also merges deltas that cancel each other. Without
&&, the downloaded zip file or in other cases source code on your local will remain somewhere in the image layers, even though the final image appears not to have them.
This trick works but looks cumbersome. With multistage build, small images are so easy.
1 2 3 4 5 6 7 8 FROM debian:wheezy AS dep RUN apt-get update RUN apt-get install curl ca-certificates RUN curl -L https://dep.tar.gz -o /tmp/dep.tgz RUN tar xzf /tmp/dep.zip FROM debian:wheezy COPY --from=dep /tmp/dep/bin /workingdir/bin
It becomes much easier to read and the final image is just one additional layer on top of the base image debian:wheezy.
Multistage builds also ensure you do not accidentally push secret credentials along with the image. Image there is a private repo that your code needs as a third party dependency. To pull its source you need to mount your ssh key as part of the build. The last thing you want is to leave your key in the layers and shipped as part of the image. Multistage builds offer a clean solution. You could download and build the private source by mounting your ssh key to the first stage, copy over only the binary output to the second stage, and push the second stage. You may argue there is still a layer on your local that keeps that key. It is true, but the key is from your local anyway.