Overview
The default JVM settings in Dremio often need adjustment in order to obtain better performance of your Dremio deployment. This article makes some base recommendations to work from.
Applies To
All Versions of Dremio
Details
Out-of-the-box, Dremio configures the JVM heap and direct memory using the environment variables:
DREMIO_MAX_HEAP_MEMORY_SIZE_MB DREMIO_MAX_DIRECT_MEMORY_SIZE_MB
These allow specific values to be set for both, and are mutually exclusive to
DREMIO_MAX_MEMORY_SIZE_MB
which allows the admin to set an overall memory size, with Dremio automatically determining heap size and giving the remainder to direct memory. While these settings are ok for configuring a cluster and getting setup, they are often not optimal for every deployment.
G1GC settings
The G1GC collector is the default Java garbage collector for Dremio. While you normally do not have to configure this, we have seen some customers where it is not explicitly set, so its worth checking the JVM settings to make sure the following is present:
-XX:+UseG1GC
The recommendation for this collector is that the heap size is fixed at start time to the maximum heap size. For example, for for a heap of 31GB the user would set
-Xms31g -Xmx31g
In addition to this, the following flag will pre-load the heap during JVM initialization rather than during runtime:
-XX:+AlwaysPreTouch
There are some other additional, miscellaneous settings we have implemented to help customers deal with error logging, heap dumps, GC pause lengths and heap occupancy limits.
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dremio/data -XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=25
Humongous Objects
Setting a region size allows room for larger objects in YoungGen space. 32MB is the maximum.
-XX:G1HeapRegionSize=32M
GC logging
Depending on your deployment type you may, or may not, generate a GC log by default. It is a good idea to make sure you are outputting GC logs, because if you open a support case our team will often request these. Here’s the GC log settings we often recommend:
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy
Sometimes we may also request a JFR (java flight recorder) or to add more verbose settings such as:
-XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC
-XX:+PrintReferenceGC
These can add to GC processing time and disk space so its best to check with Dremio Support to determine if these will be recommended in your case.
The following options also give control over GC log file location and rotation, if required:
-Xloggc:/opt/dremio/data/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=4000k
Heap sizes
As mentioned earlier, by default we use the G1GC collector. This collector is designed to operate at very large heaps. Typically our default settings start with 16GB for the coordinator and 8GB for the executor. In reality, we often see the coordinator needing an increase and you may probably find that the Executor will benefit from more heap memory.
Obviously sizing a heap is going to vary across deployments, since no two will have the same type of sources, query workloads and client user base. Often a cluster will go through some initial testing before satisfactory settings are reached. Also, as the cluster evolves, workloads will change and you may need to revisit the heap settings.
Further reading
You can find articles that discuss G1GC tuning and JVM heap sizing in the links below. If you have any questions on your Dremio deployment regarding this topic, please reach out to support@dremio.com
https://www.oracle.com/java/technologies/javase/vmoptions-jsp.html
Compressed pointers and JVM heap size -https://www.baeldung.com/jvm-compressed-oops
Sizing min/max heap - https://blog.gceasy.io/2022/03/22/benefits-of-setting-initial-and-maximum-memory-size-to-the-same-value/