Overview
When trying to run queries that make use of memory intensive SQL functions such as UNION, COUNT(DISTINCT col) or similar, this may lead to the query being cancelled with the following error: OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator. Expected at least XXX GB bytes, but only had XXX GB available.
Applies To
Dremio versions earlier than 22.x.
Details
As an example (with specific memory requirements), the following error was seen in the job profile:
OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator. Expected at least 539.95GB bytes, but only had 343.75GB available.
with the below corresponding error being seen in the server.log
of the coordinator node
2023-02-14 12:27:03,471 [1c1481fe-c87c-e4f0-65f0-911bd47b2100:execution-planning] ERROR c.d.e.util.MemoryAllocationUtilities - Query was cancelled because it exceeded the memory limits set by the administrator. Expected at least 539.95GB bytes, but only had 343.75GB available.
Size requirement for memory intensive ops is 91.69GB bytes.
Missing memory = 196.20GB bytes, Number of Memory intensive ops = 278849, Other ops = 3461401, Endpoint = xxxx
com.dremio.common.exceptions.UserException: Query was cancelled because it exceeded the memory limits set by the administrator. Expected at least 539.95GB bytes, but only had 343.75GB available.
Size requirement for memory intensive ops is 91.69GB bytes.
Missing memory = 196.20GB bytes, Number of Memory intensive ops = 278849, Other ops = 3461401, Endpoint = xxxx
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:885)
at com.dremio.exec.util.MemoryAllocationUtilities.setMemory(MemoryAllocationUtilities.java:165)
at com.dremio.exec.util.MemoryAllocationUtilities.setupBoundedMemoryAllocations(MemoryAllocationUtilities.java:104)
at com.dremio.exec.maestro.planner.ExecutionPlanCreator.getExecutionPlan(ExecutionPlanCreator.java:112)
at com.dremio.exec.maestro.QueryTrackerImpl.planExecution(QueryTrackerImpl.java:122)
at com.dremio.exec.maestro.MaestroServiceImpl.lambda$executeQuery$2(MaestroServiceImpl.java:167)
at com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3(ReleasableBoundCommandPool.java:138)
at com.dremio.service.commandpool.CommandWrapper.run(CommandWrapper.java:62)
at com.dremio.context.RequestContext.run(RequestContext.java:95)
at com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$3(ContextMigratingExecutorService.java:199)
at com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run(ContextMigratingExecutorService.java:180)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Cause
After Dremio has created a query plan, it estimates how much memory each operator in that plan may use during query execution, and adds them up. It may overestimate this memory requirement and prevent the query from executing if the memory required for execution is more than the direct memory available on each executor.
This OOM error is more likely to occur when the query includes memory intensive operators such as UNION and COUNT(DISTINCT col). It is possible to verify this within the raw query profile in the "Planning" tab under the "Convert To Rel" phase, where lots of LogicalUnion and LogicalAggregate relational operators will be seen.
It is important to note this particular OOM error occurs at the "Execution Planning" stage (before execution) of the query and as such, should not be confused with the similar error message:
OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator. This message occurs during execution and does not include the "Expected at least" and "but only had" section.
Solution
Set the support key planner.memory.aggressive and retry the query.
This support key allows the planner to make a more optimistic estimate when planning for the required memory to complete the query, and thus plan for less memory usage.
NOTE: A restart of Dremio is not required to apply this change, and this setting is enabled by default within Dremio versions 22.x and onwards.
This works in general cases and may not be ideal for scenarios where there are other underlying issues with the query or cluster. For queries where this key does not make an improvement, this may indicate the query itself needs to be optimized or re-worked, or may in fact, require more memory to execute successfully.