Here is my personal checklist for building high quality Docker images.
Further details about every topic can be found elsewhere in this document.
Use multi-stage to keep build tools and other clutter out of the final image, ensuring the final image is kept small.
Chain RUN
commands using the &&
syntax to reduce the number of layers and exclude ephemeral data / files from the final image.
Alpine: Use apk add --no-cache
which is the simple alternative toapk update && apk add <packages>
followed by rm -fr /var/cache/apk/*
. Both approaches remove the APK cache from the image.
Debian / Ubuntu: Use apt-get update && apt-get install -y --no-install-recommends <packages>
followed by rm -rf /var/lib/apt/lists/*
.
Python: Use pip install --no-cache-dir <packages>
.
Ensure that images have unique tags whether they be of the form X.Y.Z
, YYYYMMDD
or XXXXXXXXXXXX
.
Failure to use unique tags leads to old images being listed as <none>
when they are superseded.
The ENV
directive can is an effective way to pass important parameters into containers when they are started up.
Use volumes for folders that need to be persistent or shared with other containers. Ideally the container should be runnable as “read-only” meaning that no data is written to the container’s own layer.
Specifying WORKDIR
is helpful as it will put the user into the appropriate directory by default.
Use BuildKit for faster builds (especially multi-stage), better caching and some additional features.
Create a dedicated user (and maybe a group) to the image and switch to it with the USER
command.
Be sure to chown
and chmod
the contents of the home directory. Also note that the default permissions may be inconsistent between Debian / Ubuntu and Alpine.
Be sure to specify version numbers to avoid breaking changes in the future.
Debian / Ubuntu: apt-get install -y --no-install-recommends tini=0.18.*
Alpine: apk add --no-cache tini=~0.18
Python: pip install --no-cache-dir beautifulsoup4==4.8.*
Use tini or dumb-init as the ENTRYPOINT
to ensure containers can be gracefully shut down and avoid zombies which can exhaust PIDs and other resources on the host.
Incorporate a simple health check to ensure that services running in containers can be reliably monitored.
Using a multi-stage build to convert a Notebook to a simple Python script is highly advantageous.
I’ve used this approach in a number of my projects on GitHub:
For example:
# Base image versions
ARG NOTEBOOK_VERSION=c39518a3252f
ARG PYTHON_VERSION=3.8
ARG ALPINE_VERSION=3.10
# Jupyter notebook image is used as the builder
FROM jupyter/base-notebook:${NOTEBOOK_VERSION} AS builder
# Copy the required project files
WORKDIR /home/jovyan/work/wca-db
COPY --chown=jovyan:users python/*.*py* ./python/
COPY --chown=jovyan:users sql/*.sql ./sql/
# Convert Jupyter notebooks to regular Python scripts
RUN jupyter nbconvert --to python python/*.ipynb && \
rm python/*.ipynb
# Ensure project file permissions are correct
RUN chmod 755 python/*.py && \
chmod 644 sql/*.sql
# Create final image from Python 3 + Beautiful Soup 4 on Alpine Linux
FROM logiqx/python-bs4:${PYTHON_VERSION}-alpine${ALPINE_VERSION}
# Install MySQL client
RUN apk add --no-cache mysql-client=~10.3
# Note: Jovian is a fictional native inhabitant of the planet Jupiter
ARG PY_USER=jovyan
ARG PY_GROUP=jovyan
ARG PY_UID=1000
ARG PY_GID=1000
# Create the Python user and work directory
RUN addgroup -g ${PY_GID} -S ${PY_GROUP} && \
adduser -u ${PY_UID} -S ${PY_USER} -G ${PY_USER} && \
mkdir -p /home/${PY_USER}/work && \
chown ${PY_USER} /home/${PY_USER}/work
# Environment variables used by the Python scripts
ENV MYSQL_HOSTNAME=mariadb
ENV MYSQL_DATABASE=wca
ENV MYSQL_USER=wca
# Copy project files from the builder
USER ${PY_USER}
WORKDIR /home/${PY_USER}/work
COPY --from=builder --chown=jovyan:jovyan /home/jovyan/work/ ./
# Define the command / entrypoint
CMD ["python3"]
Counter to the normal policy of chaining RUN commands try to split things up to maximise caching.
BuildKit is faster than the traditional Docker build engine and has additional features.
It is also worth noting that docker.ignore is redundant due to the context being intelligent.
Set it as an environment variable with export DOCKER_BUILDKIT=1
.
Start your build
or run
command with DOCKER_BUILDKIT=1
.
Set the configuration in /etc/docker/daemon.json
then restart Docker.
{
"features": {"buildkit": true}
}
docker build prune ...
You can also specify the garbage collection policy in /etc/docker/daemon.json
:
{
"builder": {
"gc": {
"enabled": true,
"policy": [
{"keepStorage": "512MB", "filter": ["unused-for=168h"]},
{"keepStorage": "30GB", "all": true}
]
}
}
}
The git revision can be useful for CI/CD builds since it is unique and can be useful as a tag.
Here are two simple commands to get the Git commit id:
git describe --always --abbrev=12
git rev-parse --short=12 HEAD
The above commands can then be used as an image tag during the Docker build:
docker build . -t petition:$(git rev-parse --short=12 HEAD)
docker tag petition:$(git rev-parse --short=12 HEAD) petition:latest
Dockerfile: ENTRYPOINT vs CMD is a nice article describing the differences.
TL;DR - ENTRYPOINT is always run and CMD can be overridden.
It took me a little while of using Docker before I realised the benefits of a lightweight init process (e.g. “tini” or “dumb-init”) in a Docker image.
TL;DR - Use tini or dumb-init as the ENTRYPOINT to ensure that containers can be gracefully stopped and do not cause issues relating to zombies on the Docker host.
Docker and the PID 1 zombie reaping problem provides a very thorough description of the problem
What is advantage of Tini? is a through description by the author of why tini is required.
How to use –init parameter in docker run explains why –init is basically the same as running tini.
Introducing dumb-init, an init system for Docker containers is a great article that describes the proplems that dumb-init (and tini) resolve.
How critical is dumb-init for Docker? includes comments from the author of tini, impartially comparing dumb-init and tini.
It is easy to automate builds on DockerHub so that changes to a git repository trigger an image build.
Docker Hub: Configure Automated Builds from GitHub and BitBucket
This section lists some of the images that I have created using in Docker.
The base image for my Python deployments is on DockerHub:
The hardest bit of the python-bs4 build relates to lxml but it is fully documented on GitHub.
I use Jupyter notebooks for ad hoc projects written in Python.
I have documented the way I have set up Jupyter using Docker Compose.
I have used MySQL for WCA data analysis and as part of an AMP / EMP stack.
I have documented the way I have set up MySQL using Docker Compose.
I have used MariaDB for WCA data analysis and as part of an AMP / EMP stack.
I have documented the way I have set up MariaDB using Docker Compose.
TODO
TODO
TODO - Dockerizing Wordpress with Nginx and PHP-FPM
Cloudreach - Containerize This: PHP/Apache/MySQL - Code on GitHub
Stack Overflow - Alpine variants of PHP and Apache/httpd in Docker - Apache/NGINX and PHP with FCGI
New Media Campaigns - Docker for PHP Developers - Nginx, PHP and MySQL
Java Spring
Cloudreach - Containerize This! How to Dockerize Your Java Spring Application - Uses multi-stage builds
Cloudreach - Containerize This! How to build Golang Dockerfiles - Uses multi-stage builds
Docker Docs - Docker development best practices
Docker Docs - Best practices for writing Dockerfiles
Docker Docs - Isolate containers with a user namespace
Red Hat Developer - 10 things to avoid in docker containers
Red Hat Developer - Keep it small: a closer look at Docker image sizing
Project Atomic - Guidance for Docker Image Authors
Ivan Krizsan - Time in Docker Containers
A few nice articles relating to Alpine Linux:
The latest packages for Alpine Linux can be viewed online:
Note: Image size is not the primary consideration when choosing a base OS but it is still quite cool.