Overview
This article covers how to use the Dremio Diagnostic Collector (DDC), to collect diagnostic information from your cluster
Applies To
All versions of Dremio and all deployment types.
Details
The DDC tool can be found here: https://github.com/dremio/dremio-diagnostic-collector
The instructions at that link give installation and usage details.
Find your deployment type below and follow the examples
Kubernetes
1 - Download the correct binary for the platform that you intend to run DDC from:
https://github.com/dremio/dremio-diagnostic-collector/releases
2 - unzip the file
unzip ddc-linux-amd64.zip
3 - run the binary (you should see a help menu)
./bin/ddc
4 - Run a collection on your cluster
/bin/ddc -k --namespace default -e app=dremio-executor -c app=dremio-coordinator
PV Example
Often logs will be stored on a PV
1. Edit ddc.yaml and set the dremio-log-dir to where your logs are located
dremio-log-dir: "/opt/dremio/data/log" # where the dremio log is located
...ommitting rest of ddc.yaml for ease of reading
2. Then run DDC
/bin/ddc -k --namespace default -e app=dremio-executor -c app=dremio-coordinator
Note: if you have different executors in dremio then you may have a different label on each executor.
In order to work around this you can either:
Use the following syntax example to pass multiple labels into the executor parameter
/bin/ddc -k --namespace default -e "app in (dremio-executor-queries,dremio-executor-reflections)" -c app=dremio-coordinator
OR
Add labels for DDC with the following command for example
kubectl label pod dremio-executor-0 ddc=dremio-executor
these may be added permanently in the templates directory files, for example
file: templates/dremio-executor.yaml
spec:
serviceName: "dremio-cluster-pod"
podManagementPolicy: "Parallel"
replicas: 1
selector:
matchLabels:
app: dremio-executor
template:
metadata:
labels:
app: dremio-executor
role: dremio-cluster-pod
ddc: dremio-executor <<-- new label here
AWSE
On AWSE deployments, all nodes log to an EFS mount so you can just collect logs from the one node. However, configs will still be on their respective nodes.
Collection from AWSE nodes will use ssh
to connect and copy files
1 - Download the correct binary for the platform that you intend to run DDC from:
https://github.com/dremio/dremio-diagnostic-collector/releases
2 - unzip the file
unzip ddc-linux-amd64.zip
3 - run the binary (you should see a help menu)
./bin/ddc
4 - Run a collection on your cluster (use must include an ssh-user)
ddc -c <coordinator IP> -e <executor IPs> --ssh-user <user> --ssh-key <key file>
Example
Often engines may have been restarted but logs are still needed even if the instances have been terminated
Edit logs and conf folders in ddc.yaml
dremio-log-dir: "/var/dremio_efs/log" # where the dremio log is located dremio-conf-dir: "/etc/dremio/" #where the dremio conf files are located
run ddc
ddc -c 192.168.1.1 -e ip-172-31-11-13.eu-north-1.compute.internal,ip-172-31-15-55.eu-north-1.compute.internal --ssh-user ec2-user --ssh-key ~/.ssh/dremio
On-Prem / Package
This is identical to AWSE aside from the directory locations and uses ssh
based collection
Yarn
On Yarn deployments, generally, Dremio is installed onto only the coordinator node. The executors are run as containers on the Yarn cluster with Dremio requesting resources directly from Yarn.
Collection from Yarn nodes will use ssh
to connect and copy files. Often there will not be ssh access to the Yarn nodes so it wont be possible to collect the executor logs with DDC. If however you can access the executor nodes then you might need to run a separate collection for the executors since the config and logs directories will likely be different.
1 - Download the correct binary for the platform that you intend to run DDC from:
https://github.com/dremio/dremio-diagnostic-collector/releases
2 - unzip the file
unzip ddc-linux-amd64.zip
3 - run the binary (you should see a help menu)
./bin/ddc
4 - Run a collection on your cluster
/bin/ddc -c <coordinator IP> -e <executor IPs> --ssh-user <user> --ssh-key <key file>
Example
Containers are ephemeral in Yarn. We might need logs from the last 3 days when the application containers no longer exist. Note we use two collections one for the coordinator, one for the executors.
For the Coordinator (using a false IP for the executor):
1. Edit logs and conf folders in ddc.yaml
dremio-log-dir: "/var/dremio/log" #change to where ever the dremio log is located dremio-conf-dir: "/etc/dremio" #change to where ever the dremio conf files are located
dremio-rocksdb-dir: "/var/lib/dremio/db" #change to where ever the location of rocksdb dir
2. Run ddc
ddc -c 192.168.1.1 -e 192.168.100.100 --ssh-user dremio --ssh-key ~/.ssh/dremio
For the Executors (using a false IP for the coordinator):
1. Edit logs and conf folders in ddc.yaml (Note: we wont have config files on the executor so we can just pass any directory)
dremio-log-dir: "/data/yarn" # where the dremio log is located dremio-conf-dir: "/etc/dremio/" #where the dremio conf files are located
2. Run ddc
ddc -c 192.168.100.100 -e 192.168.1.10,192.168.1.11 --ssh-user dremio --ssh-key ~/.ssh/dremio
Further Reading
DDC github page: https://github.com/dremio/dremio-diagnostic-collector
Collecting log files for support - https://support.dremio.com/hc/en-us/articles/7296581582235
Running a JFR - https://support.dremio.com/hc/en-us/articles/14285366833563
Collecting heap and thread dumps manually - https://support.dremio.com/hc/en-us/articles/4418339068315
Collecting thread dumps on Kubernetes manually - https://support.dremio.com/hc/en-us/articles/13755699539099