Overview
This article provides guidance about applying JVM tuning and pod sizing limits when using Dremio in a Kubernetes environment. While we talk about specific parameters here in various Helm chart files, we will defer to the documentation on GitHub for reference information around the charts and various usages. You can find links to these pages under “Further Reading” at the end of this article..
Applies To
This article applies to all versions of Dremio that use the v2 Helm charts. The article also applies to deprecated v1 Helm charts, but some adjustments might need to be made where appropriate.
Tuning and Sizing Details
Not all of the following sections may apply to you, depending on your tuning and sizing needs (e.g., you may only need to adjust a single JVM parameter). It is recommended, though, that you review the entire article to gain a more thorough understanding of these capabilities.
The Helm charts are located on GitHub. It is strongly advised that you ensure you are on the latest version. All examples in the following sections are based on Java 8 version variants
JVM settings
By default, Dremio will attempt to adjust the JVM according to some default rules. This applies to all deployment types. The default settings, while often sufficient for testing, may not fit your needs when moving into production.
Dremio calculates the JVM heap (-Xmx) and direct (MaxDirectMemory) memory based on the memory setting in values.yaml in the root of the GitHub repository. For example, you will find the following settings for both coordinator and executors:
# Dremio Coordinator
coordinator:
# CPU & Memory
# Memory allocated to each coordinator, expressed in MB.
# CPU allocated to each coordinator, expressed in CPU cores.
cpu: 14
memory: 107374
We can see here that there are 14 CPU cores and 100GB configured for this coordinator. Dremio would typically assign 16G to the JVM heap and the remaining 84GB to direct memory (for the coordinator). For the executor, the JVM heap default is 8GB. It is important to note here that if you wish to configure JVM max heap size, then you must always set the direct memory too.
Otherwise Dremio will still try to configure the default direct memory. This can result in the pod attempting to over allocate memory.
JVM settings can be simply applied under theextraStartParams
section, for example
extraStartParams: >-
-XX:+UseG1GC
-XX:+AlwaysPreTouch
-Xms31g
-Xmx31g
-XX:MaxDirectMemorySize=10g
While you are configuring these settings you may as well apply all the settings you need. For example GC logging settings
-Xloggc:/opt/dremio/data/gc.log
-XX:NumberOfGCLogFiles=20
-XX:GCLogFileSize=100m
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintAdaptiveSizePolicy
-XX:+UseGCLogFileRotation
Other JVM settings are discussed in another tech note (See links below)
Note: there is a common section forextraStartParams
towards the end of thevalues.yaml
. if you duplicate settings, you may run into some errors starting the pods.
Persisting logs
The logs are typically not written to a persistent volume so are ephemeral and often, the first thing users will do is to redirect these unless they are using a log aggregation endpoint. Again, using the same section as above, the most common form would be:
-Ddremio.log.path=/opt/dremio/data/log
Also heap dumps and JVM error files
-XX:HeapDumpPath=/opt/dremio/data
-XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log
K8s limits
It is important to understand how setting the Kubernetes requests and limits help to ensure Dremio runs smoothly. By default, the helm charts will take thememory
size and slice it up between Heap and Direct. This same value will also appear in your pod under the "Requests" section. Example from a describe output:
Containers:
dremio-master-coordinator:
...
Requests:
cpu: 14
memory: 102400
We can see from this how easily we could oversubscribe a pod if, for example, we were to set the heap memory to 31GB but not change the default direct. Using the above numbers and just applying theXmx
setting for 31GB, we would effectively be asking the worker node for 84B direct + 31GB heap.
If the worker node’s memory is close to this limit then you may observe container restarts that seem to appear without any evidence from the Dremio logs (i.e. no zookeeper disconnects, or JVM heap OOM errors).
In such a case you can rundmesg -T
in the dremio pod which will usually show any OSoom-killer
events.
We have also observed customers seeing this when the KubernetesLimits
is not used. By default currently the helm charts do not include aLimits
config. If you wish to do so you can apply these additions as follows in thetemplates
directory for thedremio-master.yaml, dremio-coordinator.yaml, dremio-executor.yaml
files.
resources:
requests:
cpu: {{ template "dremio.executor.cpu" (list $ $engineName) }}
memory: {{ template "dremio.executor.memory" (list $ $engineName) }}
limits:
cpu: {{ template "dremio.executor.cpu" (list $ $engineName) }}
memory: {{ template "dremio.executor.memory" (list $ $engineName) }}
```
Complete example
For convenience, here is a complete example for a coordinator. Executors are usually identical aside from the heap and direct memory settings.
extraStartParams: >-
-Ddremio.log.path=/opt/dremio/data/log
-XX:+AlwaysPreTouch
-Xms16g
-Xmx16g
-XX:MaxDirectMemorySize=10g
-Xloggc:/opt/dremio/data/gc.log
-XX:NumberOfGCLogFiles=20
-XX:GCLogFileSize=100m
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintAdaptiveSizePolicy
-XX:+UseGCLogFileRotation
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/dremio/data
-XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log
-XX:G1HeapRegionSize=32M
-XX:MaxGCPauseMillis=500
-XX:InitiatingHeapOccupancyPercent=25
-XX:+UseG1GC
Once these setting are applied, you must of course redeploy the helm charts with ahelm upgrade
command.
Verifying settings
The astute engineer might notice that where settings such as direct memory is applied in the above example, the JVM command line on the Dremio pod will appear twice in the ps output. This presently cannot be avoided with the current helm charts, though bear in mind that the last setting of the duplicates are taken by the JVM. If you wish to verify these, the best way is to query the running JVM using thejcmd
command.
The following example can be run against your Dremio container to verify the settings (often the PID to Dremio is1
but if you are running sidecar containers, it may not be. Note how by default the values are consistently expressed in bytes. There are settings here that you have likely not set, as they are default settings.
(The example is from a test system with very small heap and direct settings)
% kubectl exec dremio-master-0 -c dremio-master-coordinator -- bash -c 'jcmd 1 VM.flags | tr " " "\n"'
1:
-XX:+AlwaysPreTouch
-XX:CICompilerCount=2
-XX:ConcGCThreads=1
-XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log
-XX:G1HeapRegionSize=33554432
-XX:GCLogFileSize=4096000
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/dremio/data
-XX:InitialHeapSize=1073741824
-XX:InitiatingHeapOccupancyPercent=25
-XX:MarkStackSize=4194304
-XX:MaxDirectMemorySize=1073741824
-XX:MaxGCPauseMillis=500
-XX:MaxHeapSize=1073741824
-XX:MaxNewSize=637534208
-XX:MinHeapDeltaBytes=33554432
-XX:NumberOfGCLogFiles=5
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintClassHistogramAfterFullGC
-XX:+PrintClassHistogramBeforeFullGC
-XX:+PrintGC
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintReferenceGC
-XX:+UseCompressedClassPointers
-XX:+UseCompressedOops
-XX:+UseG1GC
-XX:+UseGCLogFileRotation
Further Reading
G1GC settings for the Dremio JVM -https://support.dremio.com/hc/en-us/articles/7649417414555/
The JCMD command -https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr006.html
Dremio selecting the G1 collector -https://support.dremio.com/hc/en-us/articles/7670293504539
Compressed pointers and JVM heap size -https://www.baeldung.com/jvm-compressed-oops
Kubernetes requests and limits -https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits
Dremio v2 charts -https://github.com/dremio/dremio-cloud-tools/tree/master/charts/dremio_v2
Dremio v2 charts docs -https://github.com/dremio/dremio-cloud-tools/tree/master/charts/dremio_v2/docs