Overview
This article covers how to use the Dremio Diagnostic Collector (DDC), to collect diagnostic information from your cluster
Applies To
All versions of Dremio and all deployment types.
Details
The DDC tool can be found here: https://github.com/rsvihladremio/dremio-diagnostic-collector
The instructions at that link give installation and usage details.
Find your deployment type below and follow the examples
Kubernetes
1 - Download the correct binary for the platform that you intend to run DDC from:
https://github.com/rsvihladremio/dremio-diagnostic-collector/releases/latest
2 - unzip the file
unzip ddc-linux-amd64.zip
3 - run the binary (you should see a help menu)
./bin/ddc
4 - Run a collection on your cluster
/bin/ddc -k -n default -e app=dremio-executor -c app=dremio-coordinator -o /tmp/cluster-diags.zip
PV Example
Often logs will be stored on a PV and you might only want to collect the last 2 days of logs
/bin/ddc -k -n default -e app=dremio-executor -c app=dremio-coordinator -l /opt/dremio/data/log -C /opt/dremio/conf -a 2 -o /tmp/cluster-diags.zip
AWSE
On AWSE deployments, all nodes log to an EFS mount so you can just collect logs from the one node. However configs will still be on their respective nodes.
Collection from AWSE nodes will use ssh
to connect and copy files
1 - Download the correct binary for the platform that you intend to run DDC from:
https://github.com/rsvihladremio/dremio-diagnostic-collector/releases/latest
2 - unzip the file
unzip ddc-linux-amd64.zip
3 - run the binary (you should see a help menu)
./bin/ddc
4 - Run a collection on your cluster
/bin/ddc -c <coordinator IP> -e <executor IPs> --ssh-user <user> --ssh-key <key file> -o /tmp/cluster-diags.zip
Example
Often engines may have been restarted but logs are still needed from the last 3 days even if the instances have been terminated
/bin/ddc -c 192.168.1.1 -e ip-172-31-11-13.eu-north-1.compute.internal,ip-172-31-15-55.eu-north-1.compute.internal --ssh-user ec2-user --ssh-key ~/.ssh/dremio -l /var/dremio_efs/log -C /etc/dremio/ -a 3 -o /tmp/cluster-diags.zip
On-Prem / Package
This is identical to AWSE aside from the directory locations and uses ssh
based collection
Yarn
On Yarn deployments, generally Dremio is installed onto only the coordinator node. The executors are run as containers on the Yarn cluster with Dremio requesting resources directly from Yarn.
Collection from Yarn nodes will use ssh
to connect and copy files. Often there will not be ssh access to the Yarn nodes so it wont be possible to collect the executor logs with DDC. If however you can access the executor nodes then you might need to run a separate collection for the executors since the config and logs directories will likely be different.
1 - Download the correct binary for the platform that you intend to run DDC from:
https://github.com/rsvihladremio/dremio-diagnostic-collector/releases/latest
2 - unzip the file
unzip ddc-linux-amd64.zip
3 - run the binary (you should see a help menu)
./bin/ddc
4 - Run a collection on your cluster
/bin/ddc -c <coordinator IP> -e <executor IPs> --ssh-user <user> --ssh-key <key file> -o /tmp/cluster-diags.zip
Example
Containers are ephemeral in Yarn. We might need logs from the last 3 days when the application containers no longer exist. Note we use two collections one for the coordinator, one for the executors.
For the Coordinator (using a false IP for the executor):
/bin/ddc -c 192.168.1.1 -e 192.168.100.100 --ssh-user dremio --ssh-key ~/.ssh/dremio -l /var/dremio/log -C /etc/dremio/ -a 3 -o /tmp/coordinator-diags.zip
For the Executors (using a false IP for the coordinator):
(Note: we wont have config files on the executor so we can just pass any directory)
/bin/ddc -c 192.168.100.100 -e 192.168.1.10,192.168.1.11 --ssh-user dremio --ssh-key ~/.ssh/dremio -l /data/yarn/ -C /etc/dremio/ -a 3 -o /tmp/coordinator-diags.zip
Further Reading
DDC github page: https://github.com/rsvihladremio/dremio-diagnostic-collector
Collecting log files for support - https://support.dremio.com/hc/en-us/articles/7296581582235
Running a JFR - https://support.dremio.com/hc/en-us/articles/14285366833563
Collecting heap and thread dumps manually - https://support.dremio.com/hc/en-us/articles/4418339068315
Collecting thread dumps on Kubernetes manually - https://support.dremio.com/hc/en-us/articles/13755699539099