Summary
After upgrading to Dremio 25.x the user may experience JVM instability issues caused by reflection matching.
Reported Issue
The user might experience an issue where the Dremio coordinator will become unresponsive (UI not loading jobs / views and upstream client requests failing and timing out). This may be accompanied by a sudden increase in JVM heap usage. It was also observed that the JVM was initiating Full GCs which sometimes (but not always) caused the coordinator to restart when the connection to zookeeper would fail.
2024-11-12 09:42:33,588 [zk-curator-2] ERROR ROOT - Dremio is exiting. Node lost its master status.
Relevant Versions
Dremio 25.2.x and higher
Troubleshooting Steps
With issues such as this, the JVM heap usage suddenly increases prior to the problem and there are often one or more FullGC events around the time of the zookeeper disconnects. Adding the JVM parameter
-XX:HeapDumpBeforeFullGC
forces the JVM to flush a heap dump to disk to aid diagnosis. Understanding what objects are consuming heap is key to troubleshooting here.
Cause
The heap dump clearly showed the problem here. The dominator tree outlined that most of the heap was occupied with reflection matching objects
Dremio may sometimes use algebraic matching for finding reflections. This can be very expensive for queries that have a lot of joins, so there was a threshold setting introduced to not use algebraic matching if there were over a predetermined amount of joins. In some instances depending on the query this threshold might not be low enough as was the case here.
Steps to Resolve
Change the following support key to a lower number (default is 16)
reflections.planning.algebraic_match_limit
Ideally if a problem query is identified then changing to a number lower than the joins in the query will be a good setting
Next Steps
Check the setting of this support key before upgrading or introducing new workloads
Additional Resources
General release notes https://docs.dremio.com/current/release-notes/version-250-release/#2520-october-2024