Overview
This article covers how to use the Dremio Diagnostic Collector (DDC), to collect diagnostic information from your cluster
Applies To
Install
The DDC tool can be found here: https://github.com/dremio/dremio-diagnostic-collector or
in Dremio Versions 24.2.0 onwards and is located in the directory
/opt/dremio/tools/ddc
Prerequisites
Before running DDC, you need the following prerequisites:
- Admin access to the Dremio cluster infrastructure (kubectl access for K8s, SSH access for other types)
- Admin privileges in Dremio Software to read system tables, job profiles, and WLM
- A DDC binary for your client machine (downloaded as zip file from the "Releases" page on Github.
- Sufficient local permissions to run the downloaded binary (run ./ddc -h to verify)
Quick Start - DDC versions 3.0 and up
The following shows basic usage
Kubernetes
For Kubernetes one just has to enter the namespace where Dremio is installed.
ddc -n <where dremio is installed> --collect standard
Example: Dremio is in the dremio-install namespace:
ddc -n dremio-install --collect standard
This will generate a tarball named diag.tgz in your current working directory.
Nodes with SSH access
One must list the following to collect over ssh:
- the location of a valid ssh key that can access the nodes
- the list of coordinators separated by comma, example: -c 192.168.1.3,192.168.1.2
- the list of executors separated by comma, example: -e 192.168.1.10,192.168.1.11
- the user for the login, example: -u ssh_user_login
- the user that dremio runs as on the node, example: -b dremio
In the following example of a coodinator at 192.168.1.20 with no executors and a login user of user123, and the dremio user for the dremio process we get:
ddc -c 192.168.1.20 -u user123 -b dremio --collect standard
This will generate a tarball named diag.tgz in your current working directory.
DDC 0.7.0-2.4.0 or All Other Deployment Types
For all other installs we do not support remote access or transfer of files so a ddc local-collect is used. Assuming Dremio is running as "dremio" the following command would work on most systems.
sudo -u dremio ddc local-collect
Which will produce a diagnostic tarball by default in the /tmp/ddc directory with the name of the node where the command is run. The output will indicate this location in the following way:
file /tmp/ddc/ddc-test-dremio-executor-1.tar.gz - 6 seconds for collection - size 50940 bytes
FAQ
See the FAQ for common questions
https://github.com/dremio/dremio-diagnostic-collector/blob/main/FAQ.md
How DDC Works
DDC works by running on a node locally (local collect) and collecting information such as logs, config files, OS level info and some JVM diagnostics from the Dremio process. It can also connect to the Dremio API (on coordinator nodes) and collect WLM config, Job Profiles and System table data if the flag --collect health-check is used.
When running a remote collect with DDC you can use one of two modes:
- k8s mode - uses kubectl to connect to pods in the cluster
- ssh mode - uses ssh to connect to nodes in the cluster
The process with remote collect is:
- DDC is copied across to each node / pod with the ddc.yaml file
- DDC is run on each node / pod
- The ddc files are cleaned up and the tarball is transferred back to the local machine
- The tarballs are extracted and combined into one archive
Further Reading
DDC github page: https://github.com/dremio/dremio-diagnostic-collector
Collecting log files for support - https://support.dremio.com/hc/en-us/articles/7296581582235
Running a JFR - https://support.dremio.com/hc/en-us/articles/14285366833563