This article covers how to collect information for Dremio Support to support a RCA following a unexpected process restart
All versions of Dremio and deployment types.
Following an unexpected restart, it is always good practice to collect as much information as possible as soon as possible. This is because logs may rotate and other relevant data may expire due to any cleanup and expiry rules (e.g. logrotate config on the OS level logs).
Collecting Dremio logs
The easiest way to collect Dremio logs is by using the Dremio Diagnostic Collector (DDC). The tool will collect logs and config files from Dremio clusters for all deployment types using either
ssh access or
kubetctl as appropriate.
See the following article:
The tool does not need any build or install, although you can if you wish to compile from source. The install instructions in the above link are simply example commands to obtain the binary.
Using manual methods
If the DDC is not wanted, then the files can be collected manually or with a shell script or other tool. Bear in mind the time and date of the incident and the files that are collected. It might seem obvious, but it is very important to keep the time frame of the incident in mind when collecting log files to make sure the correct ones are sent.
Collecting OS info
Often it is useful to collect some OS level info, especially in the case of an unplanned process restart. If a process was killed by the OS, for example, this will not usually appear in application logs. However, we would likely see this in
dmesg -T > $(hostname).dmesg.out
Depending on the OS,
messages are usually in
/var/log . As noted above, the time of the incident dictates which files to collect and provide to us.