# A Beginner's Guide to Docker ## Purpose for this guideline This guide is intended to provide an overview of what Docker is, how it's used, and the basics of running Docker containers. It will not go in depth on creating a Docker image, or on the more nuanced aspects of using Docker. For a more in-depth introduction, you can read through the official Docker docs. ## Overview Docker is a tool for containerizing code. You can basically think of it as a lightweight virtual machine. Docker works by defining an image which includes whatever you need to run your code. You start with a base image, which is a pre-made Docker image, then install your dependencies on top. Python? Java? Fortran libraries? Almost anything you can install into a normal computer, you can install into Docker. There are plenty of base images available. You can start with something as basic as [Arch linux](https://hub.docker.com/_/archlinux), or as complicated as a [Windows base image with Python already installed](https://hub.docker.com/r/microsoft/windows-cssc-python). Once you have created your Docker image, it can be uploaded to LASP's internal registry for other people or machines to use. Every machine runs the Docker image in the same way. The same image can be used for local development, for running tests in Jenkins or GitHub Actions, or for running production code in AWS Lambdas. It creates a standard environment, so new developers can get started quickly, and so everyone can keep their local environments clean. Docker also makes it possible to archive the entire environment, not just the code. Code is only useful as long as people can run it. Finally, unlike many virtual machines, Docker is lightweight enough to be run only when needed, and updated frequently. ## Basics of Docker If you've used Virtual Machines in the past, the basic uses of Docker will be familiar to you. A few terms are defined below. For a more in-depth explanation, see the [official Docker overview](https://docs.docker.com/get-started/). **Docker Image:** The Docker image contains all the information needed to run the Docker container. This includes the entire operating system, file system, and dependencies. **Docker Container:** A Docker container is a specific instance of a Docker image. A Docker container is used to run commands within the environment defined by the Docker image. **Dockerfile:** The dockerfile is what defines a Docker image. It contains the commands for building a Docker image, including things like the base image to use, the installation steps to run, creating needed directories, etc. **Docker Compose:** A Docker compose file is an optional file which defines how to run the Docker images. This can be useful if you will be running multiple images in tandem, attaching volumes or networks to the containers, or just generally find yourself running the same commands for creating containers and want to optimize that. **Docker Registry:** A registry or archive store is a place to store and retrieve docker images. This is one way to share already-built docker images. LASP has a private repository, in the form of the LASP docker registry. So, you define a Docker *image* using a *Dockerfile* and/or a *Docker Compose* file. Running this image produces a Docker *container*, which runs your code and environment. An image can be pushed up to a *registry*, where anyone with access can pull the image and run the container themselves without needing access to the Dockerfile. ## Getting Started This section will outline some basic commands and use cases for Docker. First, you need to [install Docker](https://docs.docker.com/get-started/get-docker/) on your computer. Next, start by creating a dockerfile. This example dockerfile will run an `alpine` image and install Python. Traditionally, dockerfiles are named `Dockerfile`, although you can append to that if needed (eg, `dev.Dockerfile`). The `docker build` command will look in the current directory for a file named `Dockerfile` by default, but you can specify a different file though command line arguments or through your docker compose file. Generally, each Docker image should be as small as possible. Each Dockerfile should only do one thing at a time. If you have a need for two extremely similar docker containers, you can also use [Multi-stage builds](multi_stage_builds). You can orchestrate multiple docker containers that depend on each other using [Docker compose](docker_compose_examples). To start, your Dockerfile should specify the base image using `FROM .`. Then, you can set up the environment by using `RUN` commands to run shell commands. Finally, you can finish the container by using a `CMD` command. This is an optional command that will run once the entire container is set up. Here is our example Dockerfile: ```dockerfile # Starting with alpine as our base image FROM alpine # Install python RUN apk add --update --no-cache python3 && ln -sf python3 /usr/bin/python RUN python3 -m ensurepip RUN pip3 install --no-cache --upgrade pip setuptools ``` In the same folder, we run the `build` command to build our image: ```bash docker build --platform linux/amd64 -f Dockerfile -t docker_tutorial:latest . ``` The flag `–platform linux/amd64` is optional unless you are [running an M1 chip mac](running_docker_with_m1). The `-f` flag indicates the name of the Dockerfile -- in this case, it is also optional, since `Dockerfile` is the default value. The `-t` flag is a way to track the docker images and containers on our system by adding a name and a tag. `latest` is the tag used to indicate the latest version of a Docker image. Additional useful flags include `--no-cache` for a clean rebuild, and you can find a full list of flags [here](https://docs.docker.com/reference/cli/docker/buildx/build/). Now that we have built the image, we can see all the Docker images that are built on our system by running the `docker images` command: ```plaintext $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker_tutorial latest 71736be7c555 5 minutes ago 91.9MB ``` > **Info**: If you prefer to use a GUI, the Docker Desktop application can also be used to view, run, and delete docker > images. If we wanted, we could now push that image up to a registry by using the `docker push` [command](https://docs.docker.com/reference/cli/docker/image/push/). Alternatively, instead of building the image, you could pull an existing image using the `docker pull` [command](https://docs.docker.com/reference/cli/docker/image/pull/). Now that we have an image locally, we can run a container from that image using the `docker run` command: ```bash docker run --platform linux/amd64 -it --name tutorial docker_tutorial:latest ``` Once again, the platform is optional, unless you are on an M1 mac. The `-it` flag opens an interactive `tty` session -- basically so you can interact with the container via the command line. The ``--name`` flag gives the container a name. Another key flag to know is `-d`, which runs the container in detached mode. This will let the container run in the background without attaching to your terminal. You can see all currently running Docker containers with `docker ps`, and all currently existing Docker containers with `docker ps -a` . Running the `docker run` command will start your container and connect to it, so you can interactively run commands. If you run `which python` in this container, you should see that Python is successfully installed. You can use `^D` to detach from the container and stop it. With that, you have successfully run the Docker container! This is a good way to debug and run code inside a container for development purposes. If you want to have the Docker image automatically execute code when you run it, we can use the `CMD` command. For example, this can be used to run tests or the main application for a lambda container. To do this, add a line with a `CMD` at the bottom of your `Dockerfile`: ```dockerfile CMD echo "Hello world" ``` Once you build the container, you can run it without the interactive session: ```bash docker run --platform linux/amd64 docker_tutorial:latest ``` This will run once, execute the command in `CMD` at the end, and then exit the container. You can see that the container has successfully exited with `docker ps -a`. The `CMD` is how most Docker containers that run code without human intervention work. For an example of a system where that's operating, you can read the documentation on the [TIM tests in Docker](https://confluence.lasp.colorado.edu/display/DS/Containerize+TIM+Processing+-+Base+Image). ## Docker Cheat Sheet Here is a list of Docker commands that might be useful to have as a shorthand: ```bash # build locally docker build --platform linux/amd64 -f -t :latest . # Run in interactive mode docker run --platform linux/amd64 -it --name :latest # Login to docker registry docker login # View docker images docker images # View docker containers docker ps -a # Remove stopped containers docker container prune # Remove dangling images (run after container prune) docker image prune ``` ## Useful Links * [Official Docker documentation](https://docs.docker.com/) * [Installing Docker engine](https://docs.docker.com/engine/install/) * [Installing Docker Desktop for Mac](https://docs.docker.com/desktop/install/mac-install/) * [Docker CLI cheatsheet](https://docs.docker.com/get-started/docker_cheatsheet.pdf) ## Acronyms * **apk** = Alpine Package Keeper * **amd64** = 64-bit Advanced Micro Devices * **AWS** = Amazon Web Services * **pip** = Pip Installs Packages * **ps** = Process Status * **tty** = TeleTYpe (terminal) *Credit: Content taken from a Confluence guide written by Maxine Hartnett*