Reported Issue
When working with a large data set, a hash join operation may fail with
OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator
SqlOperatorImpl HASH_JOIN
Relevant Versions
This solution applies to Dremio version 24 and later.
Troubleshooting Steps
Check the profile to determine if the following support key is set (off by default in V24, on by default in V25): `exec.op.join.spill=true`
Cause
The query will fail upon encountering the memory limitation. Depending on the memory used by other concurrent queries, it may work on a retry.
Steps to Resolve
Enable the hash join spill to disk feature by setting the `exec.op.join.spill=true` support key (off by default in V24, on by default in V25).
Tips & Tricks
The hash join spill to disk feature is not enabled by default in Dremio Version 24. Enabling this feature may result in a slight performance hit for all queries due to the necessary setup during query planning.
Best Practices
n/a
Recommendations
Enable the hash join spill to disk feature for large hash join operations that may encounter out-of-memory errors.
FAQ
n/a
Summary
This article provides information on enabling the hash join spill to disk feature in Dremio version 24 and later to resolve out-of-memory errors during large hash join operations.
Additional Resources
https://docs.dremio.com/24.3.x/admin/hash-agg-spilling/
Related changes in Version 25
https://www.dremio.com/blog/deep-dive-into-better-stability-with-the-new-memory-arbiter/