Skip to content

Instantly share code, notes, and snippets.

@rtfpessoa
Last active March 18, 2022 00:54
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rtfpessoa/cce5520ee7577b4a36da4466a698a6bd to your computer and use it in GitHub Desktop.
Save rtfpessoa/cce5520ee7577b4a36da4466a698a6bd to your computer and use it in GitHub Desktop.
Scala - faster and slimmer with GraalVM
#!/usr/bin/env bash
#
# Simple Wrapper to build a native binary for your sbt application.
#
# Configuration:
# * Target:
# * `native` - Builds the binary for the machine arch (Requires GraalVM and Oracle JDK 8)
# * `docker` - Builds the binary for linux using a docker (Requires Docker)
#
# Setup GraalVM
# * Install GraalVM (https://www.graalvm.org/docs/getting-started/).
# * Add GraalVM to your PATH.
#
# Example:
# ./build-native-image.sh -n my-app -m my.main.App -t docker 1.0.0
#
set -e
TARGET="docker"
OS_TARGET="$(uname | awk '{print tolower($0)}')"
function usage() {
echo >&2 "Usage: $0 -n <app-name> -m <main-class> [-t target (native)]"
}
while getopts :s:t:n:m:h opt
do
case "$opt" in
t)
TARGET="$OPTARG"
if [[ "${TARGET}" == "docker" && "${OS_TARGET}" == "darwin" ]]
then
echo >&2 "Target docker can only build binaries for linux."
OS_TARGET="linux"
fi
;;
n)
APP_NAME="$OPTARG"
;;
m)
APP_MAIN_CLASS="$OPTARG"
;;
h | ?)
usage
exit 0
;;
*)
usage
exit 1
;;
esac
done
shift $((OPTIND-1))
if [ -z "$APP_NAME" ]; then
echo >&2 "App name was not provided."
usage
exit 1
fi
if [ -z "$APP_MAIN_CLASS" ]; then
echo >&2 "Main class was not provided."
usage
exit 1
fi
function app_classpath() {
echo $(cat /dev/null | sbt 'export runtime:fullClasspath' | tail -n 1)
}
function build_cmd() {
local BINARY_NAME=$1
local APP_MAIN_CLASS=$2
local APP_CLASSPATH=$3
local FLAGS="--static -O1"
local NATIVE_IMAGE_FLAGS="-H:+JNI -H:IncludeResourceBundles=com.sun.org.apache.xerces.internal.impl.msg.XMLMessages"
if [[ "${OS_TARGET}" == "darwin" ]]
then
FLAGS="-O1"
fi
echo 'native-image -cp '"${APP_CLASSPATH}"' '"${FLAGS}"' '"${NATIVE_IMAGE_FLAGS}"' -H:Name='"${BINARY_NAME}"' -H:Class='"${APP_MAIN_CLASS}"
}
echo "Publishing ${APP_NAME} for ${OS_TARGET}"
BINARY_NAME="${APP_NAME}-${OS_TARGET}"
BUILD_CMD="$(build_cmd ${BINARY_NAME} "${APP_MAIN_CLASS}" "$(app_classpath)")"
echo "Compiling code"
sbt compile
echo "Going to run ${BUILD_CMD} ..."
mkdir -p build
case "$TARGET" in
native)
${BUILD_CMD} && mv ${BINARY_NAME} build/
;;
docker)
docker run \
--rm=true \
-it \
-e JAVA_TOOL_OPTIONS="${JAVA_TOOL_OPTIONS}" \
--user=root \
--entrypoint=bash \
-v $HOME/.ivy2:$HOME/.ivy2 \
-v $PWD:$PWD \
findepi/graalvm:1.0.0-rc7-all \
-c 'cd /tmp && '"${BUILD_CMD}"' && mv '"$BINARY_NAME $PWD/build/$BINARY_NAME"
;;
*)
echo >&2 "Could not find command for target $TARGET"
exit 1
;;
esac

Scala - faster and slimmer with GraalVM

A small journey on how to make your Scala application faster and slimmer taking advantage of GraalVM native-image.

Intro

Scala development is notoriously plagued by its long compilation times and large deployment artifacts. This gets even worse if you are using Docker, since you need a Java Runtime that can take almost 100 megabytes.

In the first quarter of 2018 Oracle publicly announced a project, they had been working for 6 years, called GraalVM.

Graal is a polyglot virtual machine that provides high performance for individual languages and interoperability with zero performance overhead for creating polyglot applications.

It now supports languages from Java (JVM: Scala, Kotlin, ...) to Node.js, Python or R and adds some out-of-the-box zero configuration improvements to the runtime.

Graal has two flavors, the Community Edition and the Enterprise Edition. The former is open source in GitHub and available for Linux, while the latter is not open source and you need to request a license to use it in production, but is available for Mac OS. The Enterprise Edition has further benefits such as having a smaller footprint, as well as better performance and sandbox capabilities.

In the blog post we will use the Community Edition on Linux but you can also perform the same steps if you install the Enterprise version on your Mac.

Problems

Now the question is, can GraalVM solve or even help with our problems?

Regarding compilation time, Vojin Jovanovic already wrote a nice article called Compiling Scala Faster with GraalVM and left some ideas for improvements, and Christian Schmitt, from the Play Framework team, also wrote Running Play on GraalVM that can interest some Scala developers.

But what about size?

GraalVM features

Of all the features GraalVM provides one got my attention, native-image. The GraalVM native-image is a tool that enables ahead-of-time (AOT) compilation of JVM applications into native executables or shared libraries.

While the regular Java Virtual Machine (JVM) code is just-in-time (JIT) compiled at runtime, GraalVM native-image has AOT compilation. As described by Codrut Stancu in Instant Netty Startup using GraalVM Native Image Generation AOT has two main advantages:

First, it improves the start-up time since the code is already pre-compiled into efficient machine code. Secondly, it reduces the memory footprint of Java applications, since it eliminates the need to include infrastructure to load and optimize code at run time.

Still, there are additional advantages such as more predictable performance, less total CPU usage, and the ability to remove unused parts from the runtime binary thus making it smaller (much smaller).

The technology behind the GraalVM native-image is called Substrate virtual machine (VM), which in addition to your application code and dependencies, also contains some parts of the Java runtime, including the garbage collector. As Codrut Stancu explains in Instant Netty Startup using GraalVM Native Image Generation:

To achieve the goals of a self contained executable, the native-image tool performs a static analysis to find ahead-of-time the code that your application uses. This includes the parts of the JDK that your application uses, third party library code, and the VM code itself also written in Java). Therefore, at run time the native executable depends on your system native libraries, like libc, but even this dependency can be removed if you choose to fully link the native dependencies statically.

In the end you should get a native application binary that is both smaller and faster.

Scala, SBT and Docker

At Codacy, most of our Scala applications are deployed using the amazing sbt-native-packager directly from our SBT projects.

This makes our setup minimal and the deployment quick and simple. The problem with this deployment method is that the size of the generated docker image is around 100 MB (or higher), even for small applications.

This is due to the size of the Java runtime itself, but also to all the dependencies and Scala boilerplate which are also included, even when they are not even used.

Reducing this to the minimum required, should help us achieve a minimal size matching the quantity of code in the artifact and not start from 100 MB for single file applications.

Scala, SBT, Docker and GraalVM native-image

GraalVM native-image to the rescue.

The requirement of native-image is the application classpath. In SBT, you can obtain it running sbt 'export runtime:fullClasspath'.

After we have this we need to invoke native-image with the obtained classpath, a name for the binary and the entrypoint for our application.

In the end you might need some more flags and to tweak the build for your specific application but if your app is simple enough something like native-image -cp <APP_CLASSPATH> -H:Name=<BINARY_NAME> -H:Class=<APP_MAIN_CLASS> should do it.

Example

One of the biggest overheads we have at Codacy is in our tools. Since we run a lot of Open Source tools to do static analysis we have a Scala wrapper that invokes each tool and prepares the results in a nice format for us. These projects are composed of only a couple of source files, but will always lead to docker images between 100 MB - 200 MB, if not 1 GB when we depend in other platforms like Swift or Haskell (but that is a different problem).

Using GraalVM my objective was to slim the Java runtime and to be able to have a smaller docker image with my statically linked binary.

The first example I tried was codacy-eslint, since it has other details that I am currently also working on codacy/codacy-eslint#native-image.

For the sake of this tutorial, I extracted the scripts so they can be used as standalone steps in any build.

build-native-image.sh is a simple native-image wrapper that allows you to build your binary for Linux and MacOS if you have GraalVM configured in your machine, or for Linux if you have docker.

The script should support generic SBT apps and requires a name for the target binary, the main class of your application, and the app version to also build the binary name. By default it will build the binary for Linux using docker, but you can specify the target as native and it will use the locally configured GraalVM to build it for your platform.

By adding the code to a file named build-native-image.sh and giving execution permissions you can run: ./build-native-image.sh -n my-app -m my.main.App -t docker 1.0.0 and it will produce a binary inside a build directory in your project root.

Improvements

The following tables summarize the results obtained by using a standard jar versus using the GraalVM binary (both inside a docker container) for the above case of the codacy-eslint tool.

Size

Docker w/ Jar (MB) Docker w/ GraalVM binary (MB)
Node + Yarn + Dependencies 151 166
Application 92 (w/JVM) 24
Alpine base 4 4
Total 247 194
Improvement ~ -53

These values reflect the big improvement in size of the binary, where AOT reduced the required libraries versus the full JVM and dependencies.

In this specific case since we require more than 150 MB of Javascript dependencies the reduced size might seem small, but for most cases, the improvement can be more than half of the size of the docker.

Speed

Docker w/ Jar (s) Docker w/ GraalVM binary (s)
Total 6 3
Improvement ~ 3

This speed improvement (3s) does not seem to vary significantly with the total run time, but is instead a constant startup time improvement. It is, therefore, particularly helpful for short running applications. In our specific case, since our analyses entail running several small tools, this small improvement quickly adds up, leading to a high impact on the total analysis time.

Problems and Limitations

  • If your application or any of its dependencies are using the Java Native Interface (JNI) you need to enable it with the flag -H:+JNI
  • Due to AOT and the removal of unused elements some runtime dependencies might be missing and you need to force their inclusion. For example, when using some XML features I had to add -H:IncludeResourceBundles=com.sun.org.apache.xerces.internal.impl.msg.XMLMessages.
  • Lazy values are also a problem because they depend on static initializers. SubstractVM has some limitations regarding what you can do in a static initializer and in some cases you might need to remove lazy values. In most cases that will also improve your code since AOT will do the work during compile time. Reading resources or preparing any structures will be ready when running, thus reducing the overhead startup time.
  • Since you might still be bringing some features that you are not using adding -H:+ReportUnsupportedElementsAtRuntime will help you pass the compilation and you will be warned at runtime (it can be dangerous)
  • HTTPS support is available since release candidate 7, but you need to distribute the binary together with libsunec.so since it contains parts of the implementation

Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment