Not just the JVM, lot of libraries build thread pools based on number of processors and some of them even hard code the multipliers (X the number of processors etc).
Setting up each one becomes a lot of work fast, so we wrote a LD_PRELOAD[1] hook that overrides sysconf[1] _NC_PROCESSORS* call to
get number of processors availabale/online to a certain specified value and is baked in the docker image by default during builds.
Can we expect that kind of horror to fade away as Java evolves?
In the .Net world there's the ThreadPool class to manage the single process-wide thread-pool, and the Task class to enqueue and orchestrate concurrent jobs (it uses ThreadPool and almost completely abstracts away the thread management).
(You could write your own thread-pool, of course, but for most code that wouldn't make sense.)
As I understand it, the JVM is rather behind in this department. (Not to mention async/await.)
> In the .Net world there's the ThreadPool class to manage the single process-wide thread-pool, and the Task class to enqueue and orchestrate concurrent jobs (it uses ThreadPool and almost completely abstracts away the thread management).
> (You could write your own thread-pool, of course, but for most code that wouldn't make sense.)
Java has these concepts too (ExecutorService since at least Java 5, circa 2004). The problem is not with the JDK libraries, it's been in the JVM's assumption that it's running on bare metal or a VM.
Linux containers leak information about the "true" environment in a way that upset JVM assumptions before 9 and 10.
.NET has the same problem: Each process would create one thread pool thread per CPU core, and you could end up with excessive context switching.
Java possibly makes it a bit easier to work around, really, in that Java forces you to initialize your thread pool, so it wouldn't feel quite so weird to add a "# of threads in thread pool" setting to an app config file for something. I'm guessing that's not the Docker way of doing it, though.
IMO these are a great improvement but one still needs to be aware of the underlying container limits. E.g., understanding why setting MaxRAMFraction=1 might not be a good idea as the container might get killed...
This. I've seen container builds for Java applications broken in the described ways fail in production over and over again. Just because Docker makes building an image very easy doesn't mean that you will end up with a production ready image using three lines of code in a Dockerfile. Most of the time people don't even bother to use a non-root user to execute the jvm in their container...
That's why I feel platforms like Cloud Foundry are a much better fit for teams that don't have tons of container experience but want to get the benefits of a containerized runtime. The CF java buildpack[1] for example automatically handles OOM and heap settings calculation while building your application container.
disclaimer: co-founder at meshcloud, we offer a public cloud service for Cloud Foundry and K8s hosted in german datacenters.
I wrote up my experience[0] on containerizing JVM based applications a bit ago using the cloud foundry java buildpack’s memory calculator. Fortunately the JVM now has a way to respect cgroup memory[1] making it a bit simpler.
In particular, the Java buildpack uses a component developed entirely to calculate JVM settings.
I am not sure what the plans are for Java 9 and 10 yet. Ben Hale works on JBP pretty much fulltime and the Spring team tend to experiment pretty early on JDKs. So I can't see it falling far behind.
Fabric8 has really good base java images[1] with a script that simply sets environment variables with right GC and CPU parameters before launching Java, with nice sane defaults.
Heavily encourage anyone running Java in containers to use their base image, or for larger organizations to create standard base image dockerfiles that set these JVM envvar parameters. A simple contract is: ENTRYPOINT belongs to the base image, CMD belongs to downstream application images (unless something else essential).
Just don't use vanilla "FROM openjdk:8-jre" and expect it to work. That's the worst way to kill application performance and reliability in a container.
I disagree. They seem mostly targeted at low-end cloud providers who overcommit on memory and ignore application response time (gc latency). And they don't even do a good job at this.
Their configuration uses the ParallelOld gc and tunes it to aggressively shrink the heap. What that means they don't care about frequent and long gc pauses (unless you're running small heaps below 1 GB). They just care about reducing the memory footprint of the application. On multi gigabyte heaps you accept full GCs that take several seconds. They increase the number of concurrent GC threads to the number of cores. This defeats the whole purpose of the concurrent GC threads which are supposed to run concurrently with your application without stopping it. That value should be below the number of cores.
gc logging does not work on Java 9 or Java 10
If you really care about reducing memory usage you probably should do this in addition:
-XX:-TieredCompilation hurts startup perfromance, but we just established that we don't care about performance so that's fine, but easily takes 35% out of code cache
-Xss512k cuts the thread memory usage by half, this can usually be done without any issues, often -Xss256k works as well. We run Spring inside a full profile Java EE application server with -Xss256k.
And finally the most important option of them all -XX:HeapDumpOnOutOfMemoryError is missing. You absolutely, positively need this, always. It's the only way to debug OutOfMemoryErrors.
We tried those flags in the beginning since it was introduced in docker openjdk image[1].
When we dug in further, we find its just not trouble free (i.e. experimental). The default is to use 1/4th of RAM which is entirely inefficient [2]. The "MaxRAMFraction" parameter allows to specify 1/n fraction and not possible to efficiently use 65% or 75% of memory. The only place to start is to set MaxRAMFraction=2 and that already means only 50% of memory is used for heap. That produces a lot of wastage. A lot of resource efficiency is gained by starting with 65% or 80%.
OpenJDK 10 is introducing a new option "MaxRAMPercentage" [3] and that goes closer to making a script unnecessary.
TL;DR - The default flags are still experimental in JDK 8/9, and deemed to be better on Java 10. A script is just better for consistency.
+1 for Fabric8. It makes running Java in containers much more pleasant, and provides a complete opinionated ecosystem, if that's desired. I work at an enterprise shop that is a Red Hat customer (so we get commercial support on this stuff), and it has made our lives much easier in many respects.
Anyone here actually use Java containers in production?
Sadly the article mentions very little in terms of practical advice. We've tried running some small Java 8 Spring Boot containers in Kubernetes which are configured to use max ~50M heap and ~150M total off-heap yet decide to use in excess of double that memory so we end up with either a lot of OOMKills or overly large memory limits.
Yes, we do. It's actually a rather small setup (13 dedicated server, about 100 containers).
The absolutely most basic advice is probably: "-Xmx" does not represent the actual upper limit for memory usage. We actually most often only set 50% of the assigend memory for the jvm.
You may be experiencing the same bug as we did: memory freed by the GC of a containerised JVM was not able to be returned to the host machine.
IIRC it was due to a bug in glibc >= 2.1. Something about how mallocs are pooled. IIRC you need to tune it to be <= num of physical threads. Usually people advise 4 or 2.
# openjdk 8 bug: HotSpot leaking memory in long-running requests
# workaround:
# - Disable HotSpot completely with -Xint
# - slows the memory growth, but does not halt it:
MALLOC_ARENA_MAX=4
So, ensure that your java process is launched with that environment var (so, export it in the same shell, or precede your java command with it).
If you happen to be using Tomcat, I recommend putting:
export MALLOC_ARENA_MAX=4
into:
/usr/local/tomcat/bin/setenv.sh
As for how much memory you allocate to your containers: as of JRE 8u131 you can make this far more container-friendly:
We use Java containers in production on a very large scale system. We are actively migrating stuff away from Java to Go, because the JVM is a nightmare in containers. Sure we could make do and configure the JVM to hell and hack things to get it to work...but why bother? We have to allocate hundreds of MB of memory for a simple HTTP server. The tons of configuration and hacks are maintenance nightmares. It's been a terrible experience all around. Java is bloated, it's a resource hog, and the ecosystem doesn't care about cold start times. It's just a terrible fit for a containerized environment.
Spring Boot is, by design, eager to load everything it sense you might need. Most of the memory usage is not Spring itself, it's the libraries it's pulling in on your behalf.
In Spring Boot 2 I'm told you can can use functional beans to cut down on RAM usage. Not sure how it works.
But really, it comes down to deciding if you need a dependency or not.
Where I work most of our dockerized services are Spring Boot in Kubernetes, they do need more memory than what you've posted and generally run with about 300M ~ 600M usage depending on what they need to do.
You can also use smaller frameworks (Vertx? Javalin? possibly Spring Boot 2). I hope that with Java 9 we won't see this amount of memory usage anymore, however our organisation isn't there yet though.
Yeah, we deploy containerized Jenkins environments for almost 100 teams on VM's running Docker. These are massive heap containers (20+GB in some cases). Probably not the best use of Docker, but we actually are doing pretty good. Working towards migrating to an OpenShift environment and then evaluating some new tech from CloudBees in this area.
Honestly I don't feel there is any need to run Java in containers. The war/jar file is its own container with its own dependencies. The JVM still makes the same syscalls as it would inside a Docker/Kubernetes container.
In fact I would rather look at serverless architecture before considering docker/Kubernetes.
when you run a polyglot stack with java/python/go/node on top of a cluster of machines, you will love to have them containerized and uniform. It makes scripting and CI so much easier.
or, when you have a legacy app that relies on java 6, but you want everything else to run on java 8, the ability to drop everything into a container with its runtime is a life saver.
source: I'm the devops person that's responsible for making this work
We already run a polyglot stack at our company, and we use Docker(nvidia-docker) for our Python environment. With Java there is no need, and it is a lot less work updating and upgrading the JVM and our Java applications. I would use Docker for Java 6 though.
The real killer app would be the ability to fully containerize all jvm instances running on any given box.
It will be a setup where one jvm instance on the host basically serves the role of "master" in terms of class data and shared object loading while each container instance uses its memory allotment only for running computations specific to the application in that container while sharing memory objects with other containers as much as possible.
It is possible to do something similar at the moment but it requires going through a hodge-podge of painful hacks. A seamless solution to this would basically make the jvm an out of the box poor man's polyglot PaaS platform.
> It will be a setup where one jvm instance on the host basically serves the role of "master" in terms of class data and shared object loading while each container instance uses its memory allotment only for running computations specific to the application in that container while sharing memory objects with other containers as much as possible.
Eclipse OpenJ9 does something like this.
> A seamless solution to this would basically make the jvm an out of the box poor man's polyglot PaaS platform.
Trying to recreate OS resource-allocation guarantees without the the OS has been a bit of a fool's errand, historically. The OS has privileged access to hardware -- it has a view of activity and an ability to enforce guarantees that processes lack.
Add to that the amount of time and effort that has gone into operating systems to cover allocation of so many different kinds of resource under so many different conditions. It is really expensive to re-engineer that capability.
I've seen a lot of attempts at trying to share ostensibly multiuser systems above the OS level and they have mostly been unhappy once there is heavy usage. Databases, queues, the JVM, everything eventually needs to be isolated from unrelated workloads at some point. Containers and VMs are much better at providing that capability.
My OS internals knowledge is rusty so I am not sure they are quite the same but likely similar.
If you think of a typical jvm application (true for non-jvm apps as well), a significant chunk of class data will be shared since apps are typically using the same libraries (with deltas in versions), allowing easy reuse of class data across all container instances on a host would be a major scalability advance.
Similar to the goals of the multi-tenant virtual machine (MVM)?
I feel like all this cloud, container stuff is incrementally, painfully evolving towards grid computing and agents. Kinda like reinventing LISP or Prolog like features with your own 'C' runtime, instead of just using LISP or Prolog.
Have to say I really liked this article, but most of it was spent on how containers actually work on the backend. Which I think is fantastic as it's one of the most concise and easy to understand articles I've seen on the subject. I forwarded it to my team because a few of them still think Docker is some black-magic voodoo.
Wasn't the original promise of the JVM to provide container like services to applications? This field is way outside where I normally work, I would love to hear some expert commentary about this.
It kind of does: Jave EE was all about having single Java Application Servers running on each host, and deploying applications/services on top. In practice every service was built by the same company though, so focused moved toward shared libraries in containers and things like that: Search for OSGI.
The issue is both that the security concerns were way softer than they are today, and that all your dependencies better be deplorable in a JVM too. Modern ideas of having databases deployed along with the services that need them don't work quite as well as just using OS level virtualization. That said many a crufry old company still deploys hundreds of services to production by loading .war files into a cluster of servers running JBoss or WebLogic.
You are effectively correct, and to take it a bit further the "VM" in JVM does mean the same thing as the VM in VMWare even if it feels a lot different in practice. If all your dependencies are JVM based and bundled into a JAR/WAR, OS containers aren't doing much for you.
The JVM is a container by itself, and it has been able to isolate dependencies and memory requirements for many years before docker was conceived. The JVM doesn't do everything that docker does, for example it doesn't provide overlay file systems, but this is trivial to add via standard tools (e.g. aufs), and it also doesn't limit CPU usage per container but this is also trivial to add without having to deal with the extra complications of using docker. I think docker is great for non-JVM applications that run into conflicts with other applications, dependency-wise or resource-wise, but I don't see the point of using docker for pure Java applications. It just adds an unnecessary layer that provides no benefit.
Edit: If the point is to encapsulate and isolate a Java application as much as possible, I would consider using a unikernel like OSv before considering Docker. This would be more efficient and also more secure.
It is perhaps doable by getting very intimate with Java's classloaders and security policies, but most people don't want to deal with that stuff--it's complex, not fun, not portable.
That said, there are practical examples to be found in common servlet containers, with some shared libraries, but individual servlets mostly firewalled from each other.
Servlet containers have two levels of isolation: individual apps are mostly firewalled from each other as you say, but when running multiple servlet containers they are as isolated from each other as multiple docker processes are when it comes to shared libraries, security policies etc. JVM processes are easy to reason about at the OS level, while servlet applications are more complex. But we are comparing Docker with JVM, not Docker with e.g. Tomcat webapps.
The PR for memory protected preemotive multitasking operating systems still sounds remarkably similar to the container sale pitch today (at least the first half of it).
I wonder what we’ll be championing in another 10 years.
One of the authors here: First of all thanks for the great disucssion here, it really motivated to write another follow-up article.
There were some comments asking for more practical advice which is totally fair as this was supposed to be more informational and creating problem awareness.
What else would you like to see covered in a follow-up?
A base Java image and a set of recommendations. Most of us here (I assume) are running containers in K8s or Swarm. Our issue is - there is no base docker image which the java community can embrace. I have pods which restarted 35 times in the last one month.
We use shaded/fat jars where I work because it's easier to ensure we never run into issues with missing dependencies. It does however come at the cost of longer build times. I've been hoping that Docker might be able to help with this by allowing us to keep all of the dependencies in the container base image and then just add the new class files in the build process. Is that a reasonable assumption?
I wonder if it might be a good general rule to never give just 1 CPU to any multi-threaded application (JVM or otherwise).
Often, there's a mix of threads, some of which are doing CPU-intensive stuff and some which just need to do some quick thing to unblock something else, like start some new IO when some IO completes. With 1 CPU, any time any CPU-intensive stuff is happening, these quick things have to wait their turn until the next time slice. With 2 CPUs, you have twice the work and twice the likelihood of having to deal with CPU-intensive stuff, but CPU-intensive moments don't necessarily happen at the same time, so you have better odds of having one of those CPUs immediately available to do the small, quick stuff.
>> But the in case of containers with a hard memory limit, the entire container will simply be killed without warning.
At least in my experience that is not true. I quite often run into this issue and the OOM-killer will only kill one of the processes inside the container, not the entire container.
>> The same is true for default memory limits. The JVM looks at the host overall memory and uses that to set its defaults.
Well, I guess if you launch a JVM anywhere without setting appropriate memory settings, you are doing something fundamentally wrong.
>> At least in my experience that is not true. I quite often run into this issue and the OOM-killer will only kill one of the processes inside the container, not the entire container.
Hence one does not run multiple processes in the container or / and handle crashes of potential child processes correctly.
What a great read! I'm mostly in application land all day, and even though I have daily interaction with docker containers I don't have a good understanding what is going on under the hood. Now I partially understand why my coworker who manages our K8s cluster added some of the VM arguments to our java application images.
It would be nice to have some kind of cooperative API for runtimes like Java, Go etc to where the OS or container manager can provide watermark hints for the heap size and when to run collections.
Based on request traffic or maximum memory? For the JVM there is -Xmx512m to set the max heap size (512MB in this case). Or are you thinking of something more organic so individual containers could overcommit on memory, and the OOM killer would first send a "collect" signal to all the containers that could GC (and also give the memory back).
Yes when I say cooperative I mean nice processes are opting in to being told what and when to do something (change heap size, run a light or heavy collection, etc) apart from their own internal view of affairs.
Yeah I see that in the post, but to extrapolate my comment, I mean something running in a separate address space that coordinates all garbage collectors across processes/containers. It would necessarily have to be cooperative.
I don't know the precise answer off hand, but probably in an OS dependent code path in OpenJDK. I know for instance OpenJDK uses the global sysctl API to get total RAM on a FreeBSD machine. To use jails as multi-tenancy system you therefore want to fake that API out to the resource controled max for the jail.
Setting up each one becomes a lot of work fast, so we wrote a LD_PRELOAD[1] hook that overrides sysconf[1] _NC_PROCESSORS* call to get number of processors availabale/online to a certain specified value and is baked in the docker image by default during builds.
[1] http://man7.org/linux/man-pages/man3/sysconf.3.html