Summary
This article explains how to configure G1GC settings for the Dremio JVM
Reported Issue
N/A
Overview
The default JVM settings in Dremio often need adjustment in order to obtain better performance of your Dremio deployment. This article makes some base recommendations to work from.
Relevant Versions Tools and Integrations
All Versions of Dremio when a Java 8 JDK is in use.
For deployments using Java 11 see here.
Steps to Resolve
G1GC settings
The G1GC collector is the default Java garbage collector for Dremio. While you normally do not have to configure this, we have seen some customers where it is not explicitly set, so its worth checking the JVM settings to make sure the following is present:
-XX:+UseG1GC
The recommendation for this collector is that the heap size is fixed at start time to the maximum heap size. For example, for for a heap of 31GB the user would set
-Xms31g -Xmx31g
In addition to this, the following flag will pre-load the heap during JVM initialization rather than during runtime:
-XX:+AlwaysPreTouch
There are some other additional, miscellaneous settings we have implemented to help customers deal with error logging, heap dumps, GC pause lengths and heap occupancy limits.
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dremio/data -XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=25
Humongous Objects
Setting a region size allows room for larger objects in YoungGen space. 32MB is the maximum.
-XX:G1HeapRegionSize=32M
GC logging
Depending on your deployment type you may, or may not, generate a GC log by default. It is a good idea to make sure you are outputting GC logs, because if you open a support case our team will often request these. Here's the GC log settings we often recommend:
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy
Sometimes we may also request a JFR (java flight recorder) or to add more verbose settings such as:
-XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC
-XX:+PrintReferenceGC
These can add to GC processing time and disk space so its best to check with Dremio Support to determine if these will be recommended in your case.
The following options also give control over GC log file location and rotation, if required:
-Xloggc:/opt/dremio/data/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=4000k
Heap sizes
As mentioned earlier, by default we use the G1GC collector. This collector is designed to operate at very large heaps. Typically our default settings start with 16GB for the coordinator and 8GB for the executor. In reality, we often see the coordinator needing an increase and you may probably find that the Executor will benefit from more heap memory.
Obviously sizing a heap is going to vary across deployments, since no two will have the same type of sources, query workloads and client user base. Often a cluster will go through some initial testing before satisfactory settings are reached. Also, as the cluster evolves, workloads will change and you may need to revisit the heap settings.
Common Challenges
N/A
Additional Resources
You can find articles that discuss G1GC tuning and JVM heap sizing in the links below. If you have any questions on your Dremio deployment regarding this topic, please reach out to support@dremio.com
https://blog.cloudera.com/cdh6-3-hbase-g1-gc-tuning-with-jdk11/
https://www.dynatrace.com/news/blog/understanding-g1-garbage-collector-java-9/
https://www.oracle.com/java/technologies/javase/vmoptions-jsp.html
Compressed pointers and JVM heap size -https://www.baeldung.com/jvm-compressed-oops
Sizing min/max heap - https://blog.gceasy.io/2022/03/22/benefits-of-setting-initial-and-maximum-memory-size-to-the-same-value/