Overview
Some users may want to investigate the Dremio Job History (including all the recent days history)
Applies To
All versions
Details
As well as searching via the search box in the Jobs list, you can upload the current and archived queries.json file(s) into Dremio for investigation.
Note, queries.json is written to the coordinator log folder and moved to the log/archive folder every 24 hours - 30 days are kept as a default
Caution - If the support key "planner.verbose_profile" is set to true, the profile can get large and hence make the KV store large.
Copying queries.json nightly once it is rotated to your distributed store and then promoting that dataset via Dremio can help you run reporting about usage of Dremio across your organization.
Depending on deployment type you can specify where the logs reside
eg via config/dremio-env or the dremio yaml files (see Further Reading below for more information)
Uploading a single queries.json file for investigation :-
From your Dremio UI use the Upload File icon to upload your queries.json file
Select the JSON format and save
The resultant promoted file can be queried :-
Uploading the queries.json history :-
The queries.json file(s) are backed up each day into the $DREMIO_HOME/log/archive directory (along with other files)
Those files can be used to query on past jobs
Please note that the queries archive data is configurable in two ways
1) via support key jobs.max.age_in_days
2) via the maxHistory in
$DREMIO_HOME/conf/logback.xml
ie
<appender name="query" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${dremio.log.path}/queries.json</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${dremio.log.path}/archive/queries.%d{yyyy-MM-dd}.%i.json.gz</fileNamePattern>
<maxHistory>30</maxHistory>
<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.QuerySizeAndTimeBasedFNATP">
<maxFileSize>100MB</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
</rollingPolicy>
You cannot perform bulk uploads into Dremio. Instead, groups of files with the same structure in a common directory can be queried together like they are a single table. To learn more, see the chapter on Directories.
If you are running Dremio locally, you can create a NAS data source and connect to your local files. To learn more, see the chapter on NAS.
Some ideas on uploading the history are as follows
- copy the queries*.json.gz files from the log/archive directory to your distributed store
- query that directory as a whole per the following and the data can be queried
The example below shows promotion of a directory with the queries.<timestamp>.json.gz files
All the history data can be queried at this point
Copying queries.json nightly once it is rotated to your distributed store and then promoting that dataset via Dremio can help you run reporting about usage of Dremio across your organization.
The nightly copying of the rotated log file is added to the dataset and available per the Metadata refresh interval
Configuring Directories as Datasets
Further Reading
Job History & Job Details
Configuring Distributed Storage