We will cover the basics and introduce ourselves to Docker, then we can look at some of the more interesting topics and explore some of the ways we can build images and containers.
This is a plain text file defining the docker image. Hash marks a line as a comment. The first line is usually FROM, and the final lines are often CMD or ENTRYPOINT.
The FROM line tells docker what you intend to base this docker on. This is the docker you are adding to with your docker. You can base your docker on any other docker, such as ‘scratch’ (an empty docker), ‘ubuntu’ ‘centos’ ‘debian’ (base Linux distros), or other dockers that provide something you need, like ‘ruby’ or ‘java’. These values are known as tags.
The most basic possible Docker file is:
Unfortunately, this is a blank empty Docker that won’t do much. A better example is
or to use Ubuntu 14.04 as the basis of the Docker you would use:
Just using a FROM line by itself is rarely useful, you will usually want to add files to the image, install tools, and set a command to start.
About image tags
In Docker, the tag is the name of a docker image. At their most basic they are in the format creator/container:version. Examples include ‘ruby:1.9.3’, and ‘antonyh/cucumber’. It’s recommended that you always include the version otherwise Docker assumes ‘latest’ with hopefully obvious side-effects.
This will run a command. You might want to create a directory or a user account, install software with apt-get or yum, or download a file with wget or curl. The options are limitless as this will do whatever is installed inside the image at the point in time when the RUN is invoked.
ADD and COPY
These directives place files into the container. ADD has other ‘benefits’ depending on what you are trying to achieve, including expanding compressed files such as ZIPs.
This modifier tells Docker not to run the command now, but to run it when this docker is used in the FROM within another dockerfile. You’ll only use this in base images, but it’s good to be aware of it so you understand what is happening when you encounter it in other peoples files.
The entrypoint is the thing that runs the CMD statement, normally
This may be an over-simplification, but maybe this is accurate for most uses of this. Conceptually, the ENTRYPOINT is the runtime for the container, and CMD is the command you wish to invoke.
This defines the command to run. It’s important that the command is not a daemon or a background process as the container will stop when this finishes. The process should run in the foreground until it has completed.
Docker build from a Git repo
A docker container is often built from a local file. Alternatively, the ‘docker build’ command will accept a Git repository URL. Docker will clone the repo and use the Dockerfile inside.
Package management inside containers
The default images have no package lists as this makes them smaller. This is why there is a call to ‘apt-get update’ in pretty much any Dockerfile based on Ubuntu, and why it’s followed up with ‘apt-get clean’.
When a layer is built, it ‘freezes’ the version of installed software. This has two effects. Firstly that layer may remain a long time and lack vital security updates and bug fixes. Secondly, if the container is rebuilt from scratch it may end up with different versions. Because of this, it’s important to be as explicit as possible when specifying the versions of packages whether they are installed via tools such as apt-get or yum, or downloaded in the Dockerfile using wget or curl.
Minimal base images
The smallest possible base image is available via docker hub, and is named ‘scratch’. You can find a ‘hello world’ example that uses this here: https://github.com/docker-library/hello-world
The smallest useful base image is more likely to be busybox - the so-called ‘swiss army knife of embedded linux’.
Most base images seem to be based on a linux distro that has a package management system, such as Ubuntu, Centos, or Debian.
FROM ubuntu FROM centos FROM debian
As with the advice over package versions, the same needs to apply to base container version. Docker FROM declarations use tags to identify they version they reference.
FROM ubuntu:14.04 FROM centos:7 FROM debian:wheezy
FROM defaults to latest
Tags without versions in the FROM line of your dockerfile will assume you mean ‘latest’. As I’m sure you can imagine this gives a great deal of uncertainty and the possibility of unexpected consequences. If it’s possible, always set a version.
Adding to the base image
Running commands in the dockerfile
You can set commands to run in the docker container using the keywords RUN and CMD. RUN will build a layer and make it part of the image. CMD is used when the container is executed. The command itself can be either a simple line such as "mvn install" or an array of command and parameters such as [ "mvn", "install" ]. The difference between the two is a simple string is executed using ‘sh -c’ whereas the array uses go exec to run the command. Whether this makes any real difference depends on your needs and use-case, but it’s useful to know about the alternative and understand it when you find it in dockerfiles others have created.
The difference between ADD and COPY keywords
COPY is a simple tool for copying files and folders without modification. ADD does extras such as unzipping and untarring archives, and has the ability to add external files by URL. If in doubt, use COPY.
Working directory and environment variables
WORKDIR is somewhat akin to shell cd command, and ENV sets environment variables (again, similar to shell variables).
The container runtime command
Conceptually, the ENTRYPOINT is the runtime for the container, and CMD is the command you wish to invoke. If that sounds a little cryptic, consider that the default ENTRYPOINT is ‘sh -c’ so the CMD is a command that you might type into a console such as:
curl -I http://server:port/path/to/page.html
Redefining the ENTRYPOINT to something else allows great flexibility. What if you set it to ‘curl -I’? All you would need to do is define CMD to be the URL to use, or run the container as
docker run curly http://server:port/path/to/page.html
Our CMD becomes the parameter, and all ‘curly’ can do is show http headers using curl. This isn’t the most useful example, but I think it helps illustrate this point.
Use ADD as late as you can
Another way to word this: do the most static things first. Take advantage of caching as much as possible. Also be careful with ‘.’ and other wildcards - they may add files that change frequently and this will also add significant time to your build, especially if you need to do a post-ADD action such as ‘bundle install’.