Reported Issue
Dremio UI becomes unavailable and the coordinator node keeps crashing every few hours.
Relevant Versions
Reported on 24.2.2.
Other versions may also be affected.
Troubleshooting Steps
Validate the S3 sources configured in Dremio are actually S3 compatible storage.
Validate that heap memory follows the pattern below with queries.json showing normal load and no jobs that run for the entirety of the spike:
Validate that logs include many messages like:
ERROR c.a.s.s.m.t.XmlResponsesSaxParser - S3 response indicates truncated results, but contains no object summaries.
Cause
The error message S3 response indicates truncated results, but contains no object summaries
can occur when a client (Dremio) does a ListObjectsV2
call to a S3 compatible storage if that storage only implements ListObjects
(V1).
Steps to Resolve
Setting the parameter fs.s3a.list.version
to 1 will change the version of the S3 SDK's List Objects API to use to V1.
Additional Resources
To learn more about fs.s3a.list.version
you can visit:
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html