![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
In today’s cloud-centric IT landscape, right-sizing compute resources has emerged as a critical challenge for developers, DevOps and site reliability engineers worldwide. The main driver for right-sizing is typically reducing wasted resources and lowering cloud spending. But these considerations need to be balanced with other business requirements, such as performance and operational complexity. The goal is finding the sweet spot: lean enough to be cost-effective, yet robust enough to ensure reliable service delivery.
While most cloud optimization strategies are programming language agnostic, Java applications present unique challenges for right-sizing. Moreover, a new class of Java runtimes, called high-performance Java platforms, has emerged to provide relief for some of the problems specific to Java. These next-generation platforms offer improved startup times, reduced memory footprints and more predictable performance characteristics — fundamentally changing the way we approach Java application sizing in cloud environments.
Let’s explore the specific challenges and strategies for right-sizing Java application fleets in cloud environments. We’ll examine key optimization areas, diving into Java-specific considerations for each and demonstrating how high-performance Java platforms can streamline these efforts.
In a perfect world, we would only provision, and pay for, the resources needed to meet traffic at any given time and spin those resources down when they are not needed. Most importantly, this elastic scaling would happen without compromising response times, service-level agreements (SLAs) or user experience.
The main areas of work in right-sizing your fleet are typically:
With all cloud technologies, there is friction between the goals of saving money and providing maximum performance. While demand can spike immediately, it takes time to provision the server infrastructure (K8s nodes, EC2 instances, etc.) to meet that demand. These cold-start latencies force organizations into a challenging trade-off: either maintain excess capacity at higher cost to handle potential spikes or risk service degradation during demand surges. Most companies err on the side of over-provisioning, prioritizing performance and reliability over cost efficiency.
Examples of sacrificing cost for performance are:
With Java, the friction between cost and performance is even more pronounced. Even after infrastructure provisioning, Java applications need additional time and CPU resources to get to full speed. While Java delivers superior performance, security and maintainability compared to interpreted languages like JavaScript, you pay a penalty at start up and occasionally even during steady state.
The life cycle of a Java application starting up in an elastic environment looks like this:
A high-performance Java platform consists of two key components: an enhanced JDK and supporting infrastructure services. The enhanced JDK maintains full compatibility with Java SE specifications for long-term support (LTS) releases while delivering significant improvements over standard OpenJDK distributions in three critical areas:
Beyond the JDK, high-performance Java platforms provide centralized services that work with client JVMs to achieve levels of performance and operational efficiency impossible with stand-alone JDK distributions.
In this article, the main technologies we cover are:
Vertical scaling is the process of adjusting the CPU and RAM available to a server to ensure there is enough capacity to handle traffic spikes while avoiding wasting unused capacity. While traditional virtual machines and physical servers require resource allocations in coarse increments, containers enable the allocation of computing resources with surgical precision.
One popular approach is to use a Vertical Pod Autoscaler (VPA) in Kubernetes. VPAs monitor your usage and then adjust the resources available to the pod and restart it to make the adjustments take effect.
So why not just use a VPA on your Java fleet and be done with it? Well, when resizing Java containers, you often need to adjust the command-line Java heap parameters as well as the pod size, which VPA can’t do. Also, since JVMs can “reserve” memory that isn’t being used, it is difficult for a VPA to correctly measure usage and adjust. In most cases, VPA does not work for Java applications, and you need to manually set resource limits for Java containers.
The problem for Java applications is the period of high JIT CPU activity at the beginning of the run while the JVM warms up your application.
Typically, you have to reserve CPU capacity for that compilation spike even though that capacity will sit idle during steady state. In other words, you’re paying forever for a spike that only lasts a few minutes at the beginning of your application’s run.
High-performance Java platforms for reducing wasted capacity due to JIT CPU spikes are:
The other reason why people reserve lots of spare capacity, especially on latency-sensitive applications, is that “stuff happens.” From garbage collection pauses to deoptimization storms, to the JVM locking certain resources while it performs long-running tasks, you have to deal with all kinds of spiky behavior on your JVM. Thus, people often provision their containers with CPU utilization thresholds as low as 35% to reserve capacity for these spikes.
Horizontal fleet-sizing is the process of setting the number of servers running at any time to meet the current traffic. The number of servers needed is a function of each server’s carrying capacity — how many transactions each server can handle while still staying within SLAs.
The best way to reduce horizontal fleet-size is to get more work out of each server. Several high-performance Java platforms have advanced JIT compilers that can perform individual transactions with lower CPU than OpenJDK and can therefore complete more transactions overall without triggering CPU-based autoscaling policies.
The best way to optimize a server to save money is to just turn it off completely. The elastic nature of the cloud means you can scale servers up and down, either on a schedule or autoscaling based on load, so you only pay for what you use.
But while autoscaling sounds simple, it’s actually complicated and often requires re-architecting. Even an application written to scale up and down is subject to Java start-up and warm-up concerns, making it operationally difficult to ensure good performance on newly provisioned servers. A lot of developer and DevOps time is spent figuring out how to get those servers to be ready to accept traffic at speed early enough to deal with a sudden spike of traffic.
To summarize, here are the pros and cons I’ve described above for high-performance Java solutions:
When your business runs on Java, you have special concerns when trying to balance cost with performance and operational flexibility. Using a high-performance Java platform can eliminate some of the trade-offs and deliver lower cloud costs at the same or better performance.
With a high-performance Java platform, you can: