Multi-stage Docker Image Build for Java Applications

| May 25, 2017 | 2 Comments

A few days back, I discovered a new Docker feature—multi-stage builds. The multi-stage build feature helps you create thin Docker images by making it possible to divide the image building process into multiple stages. Artifacts produced in one stage can be reused by another stage. This is very beneficial for languages like Java as multiple steps are required to build the Docker image. The main advantage of the multi-stage build feature is that it can help you create smaller size images. This feature is not yet available in stable versions of Docker. It will become available in Docker 17.05. To use this feature, you have to use the edge version of Docker CE.

To build a Docker image for a Java application, you first need to build the Java project. Java build process needs JDK and a build tool like Maven, Gradle, or Ant. Once a Java binary artifact is produded, you can package the binary in a Docker image. For running a Java binary, you only need JRE, so you don’t have to pay the cost of bundling the whole JDK.

One of the ways people have handled the multiple staging builds is by using the Builder pattern. This pattern requires you to have two Dockerfile files — Dockerfile and Dockerfile_build. The Dockerfile_build file will be used to build the Java binary, and Dockerfile will be used to create the final runtime image. Below is the content of the Dockerfile_build file for a Spring Boot Gradle project.

FROM openjdk:8
ENV APP_HOME=/root/dev/myapp/
RUN mkdir -p $APP_HOME/src/main/java
WORKDIR $APP_HOME
COPY build.gradle gradlew gradlew.bat $APP_HOME
COPY gradle $APP_HOME/gradle
# download dependencies
RUN ./gradlew build -x :bootRepackage -x test --continue
COPY . .
RUN ./gradlew build

The above Dockerfile first downloads all the dependencies and then builds the project. Please note that we are copying source code after downloading the dependencies. This allows Docker to reuse the layer that downloaded Gradle dependencies. We will be changing the source code more often so we have kept it later in the Dockerfile.

To create the Java artifact, we will first build a Docker image and then create the container using the commands mentioned below.

$ docker build -t myapp_build -f Dockerfile_build .
$ docker create --name myapp-build-container myapp_build

Now, you have to copy the artifact from the myapp-build-container using the docker cp command and create the final image that will be used for execution.

As you’ll see, it is a tedious process requiring you to maintain multiple Dockerfiles.

ON DEMAND WEBINAR

Diving into Docker 

What it means for your Enterprise DevOps strategy

What does Docker mean for enterprise IT teams? Dive into this educational webinar and learn what Docker means for your software delivery processes, practical considerations to successfully implement containers as part of your release pipeline, and much more.

Docker Multi-build Feature to the Rescue

With multi-stage builds, a Dockerfile can contain multiple FROM directives. The last FROM directive output is the resultant image. So, now you have to maintain a single Dockerfile that will build the Java artifact in the first stage and then create the final image in the second stage using the artifact produced in the first stage.

FROM openjdk:8 AS BUILD_IMAGE
ENV APP_HOME=/root/dev/myapp/
RUN mkdir -p $APP_HOME/src/main/java
WORKDIR $APP_HOME
COPY build.gradle gradlew gradlew.bat $APP_HOME
COPY gradle $APP_HOME/gradle
# download dependencies
RUN ./gradlew build -x :bootRepackage -x test --continue
COPY . .
RUN ./gradlew build
FROM openjdk:8-jre
WORKDIR /root/
COPY --from=BUILD_IMAGE /root/dev/myapp/build/libs/myapp.jar .
EXPOSE 8080
CMD ["java","-jar","myapp.jar"]

In the Dockerfile shown above,

  1. We have two FROM commands. FROM command can also take an alias name so that you can reference the name later. If you don’t provide an alias, then you can refer using the index starting from 0. There can be more than two stages. The example above uses two stages.
  2. The first FROM command uses openjdk:8 image as the base image as it needs to build the project. The second FROM command uses openjdk:8-jre as it needs to run the binary.
  3. We used the from option of COPY command to copy the artifact produced by first image into the second image. The from option can either use the alias name that you can mention with FROM command or it can be an index starting from 0. For example, if we remove AS BUILD_IMAGE in the first FROM command then we can write COPY command as COPY –from=0 /root/dev/myapp/build/libs/myapp.jar .

To build the image and run the application, you can use standard commands as shown below.

$ docker build -t myapp .
$ docker run -d -p 8080:8080 myapp

The size of the image is drastically small compared to the image used for building the Java binary as shown below.

$ docker images
myapp               latest              45f3dfc8c0bc        10 minutes ago   
325MB
myapp_build         latest              b2115749abff        32 minutes ago    857MB

Conclusion

You should certainly give Docker multi-stage build feature a try for your applications. It can streamline your Docker image building process, helping you maintain fewer Dockerfiles and smaller Docker images.


Shekhar Gulati

About the Author ()

Shekhar Gulati is a polyglot programmer and technology evangelist for XebiaLabs.

  • Anthony Whitford

    The biggest shortcoming is that docker build does not support mounting a volume. For Maven (and other build tools), artifact caching is critical to performance. I need to be able to say something like: docker build -v maven_repo:/usr/share/maven/repository

    The other breakdown is that local builds (on a desktop/workstation during development) should behave a little differently than on a build server. The former will not allow to deploy to Artifactory, for example, whereas the latter will.

    Local builds will typically run ‘maven install’ vs ‘maven deploy’ — but that can be controlled by a build argument with a default, so that is ok. However, local builds typically don’t run ‘mvn site-deploy’ whereas the CI/CD server should.

    • John

      Profoundly agree — I tend to resolve your latter point by keeping maven out of the business of deployment, but the point about maven caching is bothering me too. I want to use multistage builds to streamline developer onboarding, but not at the expense of downloading every dependency every time.