Summary
Guidance for sizing Dremio in the Kubernetes environment and configuring the right JVM parameters.
Reported Issue
Dremio service going down frequently or queries/jobs complaining of insufficient memory or other missing JVM arguments.
Overview
This article provides guidance about applying JVM tuning and pod sizing limits when using Dremio in a Kubernetes environment. While specific parameters are discussed here in various Helm chart files, the documentation on GitHub is deferred to for reference information around the charts and various usages. Links to these pages can be found under the "Additional Resources" section.
Relevant Versions Tools and Integrations
This article applies to all versions of Dremio that use the v2 Helm charts. The article also applies to deprecated v1 Helm charts, but some adjustments might need to be made where appropriate.
Steps to Resolve
Refer to the below guidance, apply those settings to the Dremio Instance, and monitor the behavior.
JVM settings
By default, Dremio will attempt to adjust the JVM according to some default rules. This applies to all deployment types. The default settings, while often sufficient for testing, may not fit your needs when moving into production.
Dremio calculates the JVM heap (-Xmx) and direct (MaxDirectMemory) memory based on the memory setting in values.yaml in the root of the GitHub repository. JVM settings can be applied under the extraStartParams section, for example:
extraStartParams: >-
-XX:+UseG1GC
-XX:+AlwaysPreTouch
-Xms31g
-Xmx31g
-XX:MaxDirectMemorySize=10g
Persisting logs
The logs are typically not written to a persistent volume, so they are ephemeral. The most common form to redirect logs would be:
-Ddremio.log.path=/opt/dremio/data/log
Also, heap dumps and JVM error files can be persisted:
-XX:HeapDumpPath=/opt/dremio/data
-XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log
K8s limits
It is important to understand how setting the Kubernetes requests and limits help to ensure Dremio runs smoothly. By default, the Helm charts will take the memory size and slice it up between Heap and Direct. This same value will also appear in your pod under the "Requests" section.
If you wish to include Limits config, you can apply additions in the templates directory for the dremio-master.yaml, dremio-coordinator.yaml, dremio-executor.yaml files.
Common Challenges
Insufficient heap memory occurs when running queries that require more resources than are available. In some cases, we may need to adjust these parameters to accommodate specific jobs, even though these jobs may not run within the given memory limits.
Additional Resources
G1GC settings for the Dremio JVM - https://support.dremio.com/hc/en-us/articles/7649417414555/
The JCMD command - https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr006.html
Dremio selecting the G1 collector - https://support.dremio.com/hc/en-us/articles/7670293504539
Compressed pointers and JVM heap size - https://www.baeldung.com/jvm-compressed-oops
Kubernetes requests and limits - https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits
Dremio v2 charts - https://github.com/dremio/dremio-cloud-tools/tree/master/charts/dremio_v2
Dremio v2 charts docs - https://github.com/dremio/dremio-cloud-tools/tree/master/charts/dremio_v2/docs