Summary/Reported Issue
When querying system table sys.jobs_recent
queries will run very slowly and appear unresponsive, particularly on clusters with high workloads.
Relevant Versions
Dremio v25.x and above
Cause
The system table sys.jobs_recent
was originally designed as an internal table only, and is queried using an external engine rather than the main dremio query engine. Work is ongoing to review the design after customer feedback (internal ref: DX-98372).
Steps to Resolve
As a workaround to resolve the issue with non-performant queries until this issue is resolved, we recommend creating a Reflection on sys.jobs_recent
data, which can then be refreshed cyclically to maintain a responsive source for that job history data. You cannot create Reflections on system tables directly, however it is possible to utilise a CROSS JOIN to achieve the same result.
1. Create a view using a CTE, ie:
CREATE VDS MyVdsLocation.sysjobs_view AS
with one_row as ( select 1 as col1)
select * from sys.jobs_recent CROSS JOIN one_row
This VDS would now be your query endpoint rather than sys.jobs_recent
.
2. Create a Reflection on that VDS
3. Adjust the refresh cycle on that reflection to be greater than the time it takes to run the select on sys.jobs_recent
, otherwise every refresh will fail. It would be good to allow some additional overhead, ie 2 hours.
You need to use the catalog API to set the refresh rate on the source sys.jobs_recent
table, for example:
curl -k -X PUT "$TARGET/apiv2/dataset/sys.jobs_recent/acceleration/settings" \
-H "Authorization:_dremio$TOKEN" \
--header 'Content-Type: application/json' \
--data-raw '{"accelerationRefreshPeriod":4600000,"accelerationGracePeriod":11000000,"accelerationNeverExpire":true,"accelerationNeverRefresh":true}'
4. You can confirm your settings have been applied by querying the source:
curl -k -S -X GET --location "$TARGET/api/v3/catalog/by-path/sys/jobs_recent" -H "Authorization: _dremio$TOKEN" -H "Content-Type: application/json" -H "Accept: application/json"
This will enable you to build a reflection containing recent sys.jobs_recent data, which will be performant when querying that data.
Additional Resources
API reference here