JVM Tuning with G1 GC
A Garbage-First Garbage Collector approach for your ForgeRock applications
In order to successfully tune your JVM you must have clearly defined performance targets. This is your definition of success, and, without a definition of success, you cannot succeed. The goal of performance tuning is to meet these goals, no more, no less.
This paper will discuss some concepts to help understand and tune your ForgeRock applications JVM’s to meet your goals. The correct values to select for your organisation depends entirely on your performance targets.
There has been significant work in the field of garbage collection in the last few years and this is ongoing. At the time of writing this paper there are two exciting prospective garbage collectors, that is, Z Garbage Collector (ZGC), and Shenandoah GC. When these become more mainstream the recommended GC to use for your ForgeRock applications will likely change, more information on ZGC and Shenandoah GC can be found in the references.
As of Java 9 the ConcurrentMarkSweep (CMS) Garbage Collector has been deprecated. At this stage we will focus on tried and tested G1 GC which is available in Java 8, as well as Java 11 (Supported JDK for most ForgeRock applications). The focus of this article will be Java 11, if you’re using Java 8 then you’ll need to modify the logging section to include legacy logging options.
In some cases CMS GC may be found to be more performant than G1 hence the decision of some organisations to stick with CMS on Java 8. As CMS has been deprecated in later Java versions, the switch to another GC is imminent, therefore moving to G1 and understanding the concepts may be considered a step to future-proof your solution. Remember, you don’t need the fastest solution, you only need to meet your performance targets.
G1 GC Basics
G1 GC is a generational garbage collector, that is, the heap is split into generations with the premise, most objects die young. It is more efficient to deal with (clean) objects in the young generation rather than move to the old generation and clean it up there. This is no different than Serial, Parallel and CMS GC.
The G1 GC is a low pause-time collector, whose priority is to attempt to meet a maximum pause-time target. This may come at the expense of throughput, however, from experience, it’s generally deemed more important to avoid long application pauses than overall throughput. Remember the goal here is to meet your performance targets.
Most generational garbage collectors split the heap into contiguous regions. In the diagram that follows we can see the heap has been split into its various generations, Young Generation — consisting of Eden and Survivor Regions and Old Generation. The space is contiguous that is from position 1,2,3….n.
G1 GC differs greatly in the allocation of the heap space. G1 splits the heap into a number, typically 2048, smaller heap regions, for this reason heap size makes direct difference in region size. Each of the regions can be allocated as an eden, a survivor or an old region. The number of regions allocated to either eden, survivor or old is flexible and determined by the GC at runtime. The following diagram depicts the very different allocation of the heap.
This method of breaking up the space into regions allows the GC to avoid (as long as possible) large GC’s of the entire heap.
During Garbage Collection there are a number of events where the entire application is paused, these are often referred to as Stop-The-World (STW) events. During these pauses any requests to the JVM need to wait until the pause is finished. As such, one of the goals of GC tuning is to minimise these pauses. The Full GC is usually the longest STW pause as the entire heap needs to be traversed, whereas some other STW pauses which may be needed by the Garbage Collector may be shorter and within acceptable limits. During the process of tuning the GC log analysis should help to identify excessive pauses and their causes, then steps can be taken to rectify the issues.
Garbage Collection Phases
At a high level the G1 GC has 3 main collections. The young GC only cleans the young generation, that is moving live objects from eden to survivor, from one survivor to another as well as moving objects that have reached their MaxTenuringThreshold into old generation. A Full GC (both new and old regions) still occurs as a fallback position, this is very expensive and significant effort is involved in avoiding the need for a Full GC. The G1 GC also has the concept of a Mixed GC which gives G1 GC its name — Garbage First. In this GC the young generation is cleaned, as well as a number of regions (configurable) from the old space that contain the most garbage, i.e. Garbage First. This mechanism allows the G1 GC to attempt to avoid the Full GC’s for as long as possible. As the full GC’s are mainly responsible for the long pause time associated with garbage collections, the G1 GC is able to minimise the need for these expensive operations.
Young Garbage Collection
Normal Young GC — A few young GC’s move objects between eden and survivor and eventually to old space. At a certain old generation threshold, determined by the Initiating Heap Occupancy threshold (IHOP), a Concurrent Start young GC is started.
Concurrent Start — Start of concurrent marking process. This phase works concurrently with young GC’s until finished. In this phase live objects are determined in the old region, this phase ends with two stop the world events (application pauses) — Remark and Cleanup.
Remark — Finalises marking, reclaims empty regions and class unloading, also starts to determine old regions which can be cleaned concurrently. Stop-the-world event.
Cleanup — Determines if mixed GC follows. Stop-the-world event.
Mixed Garbage Collection
This phase involves one or more mixed collections, that is, new generation as well as a number of old regions (configurable) that have the most garbage. At the end of a mixed collection G1 determines if it needs another mixed collection in order to reach its threshold (configurable). After this the cycle starts again another young GC phase.
Full Garbage Collection
Like other GC’s, this is the fall back position. If the application runs out of memory while gathering liveness information this can result in a stop-the-world Full GC, i.e. both Young and Old Generation. One of the major goals of G1 GC and other generational garbage collectors is to avoid expensive Full GC’s.
G1 GC Tuning
G1 GC has significantly less JVM options available than CMS and the intention is to use less. When moving from CMS to G1, or from/to any GC the majority of installations inherit previous JVM options without consideration or understanding of their use. Do not do this.
The basic strategy to tune your JVM for G1 GC is to set heap size and pause-time goal, then let the JVM dynamically modify the required settings to attempt to meet the pause-time goal. If the performance goals are not met, then consider other options based on GC monitoring / log analysis. This is an iterative process and it is important to ensure enough time and resources are allocated to this critical task.
Basic JVM Options
Explicitly Set the GC Algorithm
It is recommended to explicitly set the required GC, to set the G1GC you will add the following JVM Option:
By setting this explicitly you know exactly what you are getting and are generally not subject to change unless you decide to. For example on Java 8, the default GC is Parallel GC, while on Java 11 the default is G1GC. This means that on upgrading from Java 8 to Java 11 unless you’ve explicitly set the GC, this will be changed on your behalf, for better or worse.
It is recommended to explicitly set the min and max heap size to the same value, this will avoid dynamic shrinking and growing of the heap during the applications lifecycle.
-XX:InitialHeapSize — Minimum Java heap size
-XX:MaxHeapSize — Maximum Java heap size
Pause Goal and Young Generation Sizing
The G1 GC has a pause time-target that it tries to meet i.e. a soft target.
During young collection the G1 GC adjusts the size of the young generation to meet the real-time target, this includes new and survivor regions.
For this reason it’s generally recommended to set the pause time target and let the GC change the heap as needed. This is an important concept: do not set the new generation size unless required.
In order to set the pause time-target, set following JVM option:
For example you could start by setting this value between 200–500 i.e. (-
XX:MaxGCPauseMillis=500 ) and test to see if your performance targets are met. Note: the default value for this is 200.
Garbage Collection Logging
Tuning is an iterative process based on data collected throughout the tuning phases, therefore it’s recommended to enable GC logging, even in production environments. Obviously you’ll require a logging strategy to handle logs on the system from consuming resources (space). The default logging level is info and can be adjusted as required.
Unified JVM logging has replaced old logging options as of Java 9. The logging options with Java 8 will not work with Java 11. See See JEP 158 for details.
It is recommended to set the following JVM options:
-Xlog — this will be set with the options specified below. If further debugging in needed you can increase the logging level per tag i.e.
gc*- print all GC events, similar to previous PrintGCDetails
safepoint — print values as previously set with Java 8-
age* — print details similar to previous
PrintTenuringDistribution at a debug level, set to trace for full logging of what was printed with
ergo* — Use a level of debug for most of the information, or a level of trace for all of what was logged for
time — Current time and date in ISO-8601 format
level — The level associated with the log message
tags — The tag-set associated with the log message
uptime — Time since the start of the JVM in seconds and milliseconds
file=filename- filename, optionally including %p and/or %t to include JVM’s PID and startup timestamp.
filecount=count — set number of files before rollover
filesize=size — set filesize
Example of the above:
Other JVM Options
The following JVM options are recommended to consider and have found to be effective in tuning ForgeRock application JVM’s.
-XX:+DisableExplicitGC — recommend setting this value which disables processing of calls to the
-XX:+UseStringDeduplication — String deduplication reduces the memory footprint of String objects on the Java heap. This is disabled by default.
Sets the maximum amount of native memory that can be allocated for class metadata. Recommend setting this value to 256MB and monitor for any issues.
-XX:MaxTenuringThreshold=threshold — Sets the maximum amount of iterations to keep live objects in the new generation. This defaults to 15. If objects do not need to be kept in the new generation for a long time because they will end up in the old generation anyway, you can lower this value. i.e. for Directory Server this is recommended to 1. What this says is an object will likely either live for one iteration or likely live for a long time, so move it to old space. This clears out eden space for new objects. This recommendation does not apply to all applications and setting this too low will end up with garbage sitting in old space that may have been efficiently cleaned in the new space. Monitor your application to determine the best setting for this value.
-XX:+ParallelRefProcEnabled — Recommend setting this value to enable parallel reference processing. By default, this option is disabled.
The following represents an example of the above settings as JVM options for a DS instance.
-XX:+UseG1GC -XX:InitialHeapSize=2g -XX:MaxHeapSize=2g -XX:MaxGCPauseMillis=500 -XX:+DisableExplicitGC -XX:+UseStringDeduplication -XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=256m -XX:MaxTenuringThreshold=1 -Xlog:gc*,safepoint,age*,ergo*:file=/opt/gclogs/gc-%p-%t.log:tags,uptime,time,level:filecount=10,filesize=50m
Important: DO NOT cut and paste this into your application, make informative decisions about the values you set based on your targets.
Further Tuning advice
So you’ve set the recommended settings and you’re not meeting performance targets, what do you do now?
Here are a few options to consider. There are no specific recommended values as they will be based on your analysis of the GC behaviour.
This value sets the number of parallel marking threads. By default, this is set to approximately 1/4 of the number of parallel garbage collection threads (
ParallelGCThreads). For example, a system with 16 logical processors will default ParallelGCThreads to 16 and therefore
ConcGCThreads to 4. You can increase the number of
ConcGCThreads to increase the number of parallel marking threads and lower pauses during marking.
Mixed GC’s are initiated when certain conditions are met and after successful completion of concurrent marking phase.
-XX:InitiatingHeapOccupancyPercent — This value determines when the initial marking process starts, the value defaults to 45% but G1GC attempts to find an optimal value for IHOP and only uses this value if: there is not enough information to optimise or; the adaptive IHOP is overridden. You can set the value higher to start concurrent marking later, or to start marking earlier.
For example, if you are getting Full GC’s due to allocation failure, OR you see Evacuation Pause/Evacuation Failure this generally means objects can’t be allocated as there is not enough memory or as the objects can’t be reclaimed fast enough, then you could look to start marking earlier by lowering the IHOP value. In order to override the adaptive behaviour, you would set the
-XX:G1HeapWastePercent — These can be considered when you want to change the mixed garbage collections decisions, i.e. used to determine which regions to collect on mixed collection, the goal being to decrease the time of mixed collections.
-XX:G1HeapRegionSize — In G1 GC we’ve already described how objects are stored in regions. If the object size is equal to or greater than 50% of the region size it is considered a humongous object. Humongous objects are allocated directly to the old generation, are handled differently and can cause fragmentation. A Full GC could be initiated to find contiguous regions to store humongous objects. You could consider resizing the region size by increasing the heap size (and therefore increase region size) or manually increasing regions size by setting
-XX:MaxNewSize — It is possible that the young generation has been dynamically tuned by G1 GC based on previous application behaviour, and then the application behaviour changes. This could result in incorrect young generation optimisation. While it’s generally recommended to avoid setting these, under some conditions they might allow a more accurate reflection of the correct distribution of new/old heap than the G1 algorithm.
Monitoring your JVM and Garbage Collection
I haven’t covered monitoring in this article, if you wish to learn how to monitor your Java Applications JVM’s and Garbage Collection, then please read my other article How to Monitor your Java Application’s JVM.
This paper provided you with a high level view on the Garbage First Garbage Collector, G1 GC, as well as some useful and recommended JVM options to tune your ForgeRock application JVM’s. When tuning your JVM consult product documentation to find application specific recommendations. By no means have all the possible options been covered, a list of references has been provided for you for further research.
The following concepts are very important and cannot be stressed highly enough:
- Without a definition of success, you cannot succeed so know your performance targets.
- Start with the basic options, let the JVM modify as required to meet its target. If your goals cannot be met, then, and only then, start using more detailed options.
- Allocate enough time and resources for performance testing.