
Building Images


Here is my personal checklist for building high quality Docker images.

Further details about every topic can be found elsewhere in this document.

Image Size

Multi-Stage Builds

Use multi-stage to keep build tools and other clutter out of the final image, ensuring the final image is kept small.

Chain RUN Commands

Chain RUN commands using the && syntax to reduce the number of layers and exclude ephemeral data / files from the final image.

Package Managers

Alpine: Use apk add --no-cache which is the simple alternative toapk update && apk add <packages> followed by rm -fr /var/cache/apk/*. Both approaches remove the APK cache from the image.

Debian / Ubuntu: Use apt-get update && apt-get install -y --no-install-recommends <packages> followed by rm -rf /var/lib/apt/lists/*.

Python: Use pip install --no-cache-dir <packages>.

Image Management


Ensure that images have unique tags whether they be of the form X.Y.Z, YYYYMMDD or XXXXXXXXXXXX.

Failure to use unique tags leads to old images being listed as <none> when they are superseded.

Environment Variables

The ENV directive can is an effective way to pass important parameters into containers when they are started up.


Use volumes for folders that need to be persistent or shared with other containers. Ideally the container should be runnable as “read-only” meaning that no data is written to the container’s own layer.

Work Directory

Specifying WORKDIR is helpful as it will put the user into the appropriate directory by default.


Use BuildKit for faster builds (especially multi-stage), better caching and some additional features.

Container Security

Non-Root User

Create a dedicated user (and maybe a group) to the image and switch to it with the USER command.

Be sure to chown and chmod the contents of the home directory. Also note that the default permissions may be inconsistent between Debian / Ubuntu and Alpine.

Container Reliability

Specify Version

Be sure to specify version numbers to avoid breaking changes in the future.

Debian / Ubuntu: apt-get install -y --no-install-recommends tini=0.18.*

Alpine: apk add --no-cache tini=~0.18

Python: pip install --no-cache-dir beautifulsoup4==4.8.*

Init Process

Use tini or dumb-init as the ENTRYPOINT to ensure containers can be gracefully shut down and avoid zombies which can exhaust PIDs and other resources on the host.

Health Check

Incorporate a simple health check to ensure that services running in containers can be reliably monitored.


Multi-stage Builds

Using a multi-stage build to convert a Notebook to a simple Python script is highly advantageous.

I’ve used this approach in a number of my projects on GitHub:

For example:

# Base image versions

# Jupyter notebook image is used as the builder
FROM jupyter/base-notebook:${NOTEBOOK_VERSION} AS builder

# Copy the required project files
WORKDIR /home/jovyan/work/wca-db
COPY --chown=jovyan:users python/*.*py* ./python/
COPY --chown=jovyan:users sql/*.sql ./sql/

# Convert Jupyter notebooks to regular Python scripts
RUN jupyter nbconvert --to python python/*.ipynb && \
    rm python/*.ipynb

# Ensure project file permissions are correct
RUN chmod 755 python/*.py && \
    chmod 644 sql/*.sql

# Create final image from Python 3 + Beautiful Soup 4 on Alpine Linux
FROM logiqx/python-bs4:${PYTHON_VERSION}-alpine${ALPINE_VERSION}

# Install MySQL client
RUN apk add --no-cache mysql-client=~10.3

# Note: Jovian is a fictional native inhabitant of the planet Jupiter
ARG PY_USER=jovyan

# Create the Python user and work directory
RUN addgroup -g ${PY_GID} -S ${PY_GROUP} && \
    adduser -u ${PY_UID} -S ${PY_USER} -G ${PY_USER} && \
    mkdir -p /home/${PY_USER}/work && \
    chown ${PY_USER} /home/${PY_USER}/work

# Environment variables used by the Python scripts

# Copy project files from the builder
WORKDIR /home/${PY_USER}/work
COPY --from=builder --chown=jovyan:jovyan /home/jovyan/work/ ./

# Define the command / entrypoint
CMD ["python3"]

Maximise Caching

Counter to the normal policy of chaining RUN commands try to split things up to maximise caching.


BuildKit is faster than the traditional Docker build engine and has additional features.

It is also worth noting that docker.ignore is redundant due to the context being intelligent.

  1. Set it as an environment variable with export DOCKER_BUILDKIT=1.

  2. Start your build or run command with DOCKER_BUILDKIT=1.

  3. Set the configuration in /etc/docker/daemon.json then restart Docker.

	"features": {"buildkit": true}

Clearing the Build Cache

docker build prune ...

You can also specify the garbage collection policy in /etc/docker/daemon.json:

    "builder": {
        "gc": {
            "enabled": true,
            "policy": [
                {"keepStorage": "512MB", "filter": ["unused-for=168h"]},
                {"keepStorage": "30GB", "all": true}


The git revision can be useful for CI/CD builds since it is unique and can be useful as a tag.

Here are two simple commands to get the Git commit id:

git describe --always --abbrev=12
git rev-parse --short=12 HEAD

The above commands can then be used as an image tag during the Docker build:

docker build . -t petition:$(git rev-parse --short=12 HEAD)
docker tag petition:$(git rev-parse --short=12 HEAD) petition:latest


Dockerfile: ENTRYPOINT vs CMD is a nice article describing the differences.

TL;DR - ENTRYPOINT is always run and CMD can be overridden.

The PID 1 Problem

It took me a little while of using Docker before I realised the benefits of a lightweight init process (e.g. “tini” or “dumb-init”) in a Docker image.

TL;DR - Use tini or dumb-init as the ENTRYPOINT to ensure that containers can be gracefully stopped and do not cause issues relating to zombies on the Docker host.

Docker and the PID 1 zombie reaping problem provides a very thorough description of the problem


What is advantage of Tini? is a through description by the author of why tini is required.


How to use –init parameter in docker run explains why –init is basically the same as running tini.


Introducing dumb-init, an init system for Docker containers is a great article that describes the proplems that dumb-init (and tini) resolve.

How critical is dumb-init for Docker? includes comments from the author of tini, impartially comparing dumb-init and tini.


It is easy to automate builds on DockerHub so that changes to a git repository trigger an image build.

Docker Hub: Configure Automated Builds from GitHub and BitBucket

