Summary
Planning for all queries is taking much longer on a scale-out coordinator than the master-coordinator in the same cluster. This document describes one possible cause to these symptoms.
Reported Issue
It was observed that queries were taking longer to plan on a scale-out coordinator, one example query demonstrated a 2 second planning cycle on the master-coordinator, where the same query on the scale-out coordinator would take over 30 seconds.
Relevant Versions
All supported Dremio versions
Troubleshooting Steps
Review the query profiles for those slow queries and see if the planning time is slower than if the query was run on the master coordinator. To test this, use the Dremio UI as this only runs against the master coordinator.
State Durations
Pending: 0ms
Metadata Retrieval: 298ms
Planning: 78,427ms
Engine Start: -
Queued: -
Execution Planning: -
Starting: -
Running: 0ms
Cause
Dremio has the option to collect a significantly increased set of statistical information for a PDS (table) by running the ANALYZE TABLE SQL command. This command will generate additional compute and delete statistics, including the estimated number of distinct values, number of rows and number of null values. All this is stored in the KVStore. If the support key planner.use_statistics
is enabled, queries will leverage these additional statistics to try and further optimise the query plan.
As the KVStore itself can only be accessed by the master-coordinator, tall scale-out coordinators have to read metadata data through the master-coordinator, and the increased statistical data will consume resources for extended periods and introduce delays in the planning phase.
Steps to Resolve
Disable the support key planner.use_statistics
on all scale-out coordinators.
Next Steps
There is no restart of the coordinator required or further steps. At the time of writing, the use of these increased statistics are minimal and there is no expected performance loss due to bad optimisations.