The default JVM settings in Dremio often need adjustment in order to obtain better performance of your Dremio deployment. This article makes some base recommendations to work from.
All Versions of Dremio
Out-of-the-box, Dremio configures the JVM heap and direct memory using the environment variables:
These allow specific values to be set for both, and are mutually exclusive to
which allows the admin to set an overall memory size, with Dremio automatically determining heap size and giving the remainder to direct memory. While these settings are ok for configuring a cluster and getting setup, they are often not optimal for every deployment.
The G1GC collector is the default Java garbage collector for Dremio. While you normally do not have to configure this, we have seen some customers where it is not explicitly set, so its worth checking the JVM settings to make sure the following is present:
The recommendation for this collector is that the heap size is fixed at start time to the maximum heap size. For example, for for a heap of 31GB the user would set
In addition to this, the following flag will pre-load the heap during JVM initialization rather than during runtime:
There are some other additional, miscellaneous settings we have implemented to help customers deal with error logging, heap dumps, GC pause lengths and heap occupancy limits.
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dremio/data -XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=25
Setting a region size allows room for larger objects in YoungGen space. 32MB is the maximum.
Depending on your deployment type you may, or may not, generate a GC log by default. It is a good idea to make sure you are outputting GC logs, because if you open a support case our team will often request these. Here’s the GC log settings we often recommend:
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy
Sometimes we may also request a JFR (java flight recorder) or to add more verbose settings such as:
These can add to GC processing time and disk space so its best to check with Dremio Support to determine if these will be recommended in your case.
The following options also give control over GC log file location and rotation, if required:
-Xloggc:/opt/dremio/data/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=4000k
As mentioned earlier, by default we use the G1GC collector. This collector is designed to operate at very large heaps. Typically our default settings start with 16GB for the coordinator and 8GB for the executor. In reality, we often see the coordinator needing an increase and you may probably find that the Executor will benefit from more heap memory.
Obviously sizing a heap is going to vary across deployments, since no two will have the same type of sources, query workloads and client user base. Often a cluster will go through some initial testing before satisfactory settings are reached. Also, as the cluster evolves, workloads will change and you may need to revisit the heap settings.
You can find articles that discuss G1GC tuning and JVM heap sizing in the links below. If you have any questions on your Dremio deployment regarding this topic, please reach out to firstname.lastname@example.org
Compressed pointers and JVM heap size -https://www.baeldung.com/jvm-compressed-oops