Overview
This article covers how to collect information for Dremio Support to assist with the root cause analysis where the Java threads or heap usage needs to be analyzed. Java Flight Recorder is a very powerful sampling tool that is shipped with the OpenJDK on more recent builds and has been back-ported all the way to JDK8.
Applies To
All versions of Dremio and deployment types.
Details
Rather than using a heap dump and doing a point-in-time analysis, you can turn JFR on in the background and get a sampling-based analysis of the performance of the JVM from various angles.
Make sure you have a recent version of the OpenJDK; in this case, it is on by default in OpenJDK 8u 272+ or in OpenJDK 11+
Collecting the JFR on the Server and save it
Run the following on the server you want to analyze. NOTE the command sudo -u dremio jcmd $DREMIO_PID VM.unlock_commercial_features
may fail on some versions of the JDK. When this happens just skip the command.
# Set the environment parameter DREMIO_PID to the current dremio process id.
export DREMIO_PID=$(ps ax | grep dremio | grep -v grep | awk '{print $1}')
# If the following command errors this is safe to ignore, and jump to the next command
sudo -u dremio jcmd $DREMIO_PID VM.unlock_commercial_features
sudo -u dremio jcmd $DREMIO_PID JFR.start name="DR_JFR" settings=profile maxage=3600s filename=/opt/dremio/data/coordinator.jfr dumponexit=true
# When you have observed the problem, run the following to stop the process
sudo -u dremio jcmd $DREMIO_PID JFR.stop name="DR_JFR"
The parameter maxage defines how long to persist data, and with a maxage of 3600s, data older than 1 hour is deleted. To limit the impact of tracing, JFR records data to memory, and the setting dumponexit, will save the recorded data to disk every time the JVM exits.
Please then compress and upload the resulting jfr file to the support ticket.
Further info