Here is the converted HTML-based knowledge base article:
Summary
This article discusses the cause and solution of jobs that fail with a "Buffer count exceeds maximum" error.
Reported Issue
Typically if a query fails with the following ERROR it would mean that you have reached a Workload Manager defined limit on memory at the query or queue level.
OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator.
However, this is not always the cause. In some scenarios, the error stack returned in the job profile and server.log will report an additional error of "Buffer count exceeds maximum" . This article describes how you can identify and overcome this additional error.
Relevant Versions
All Software releases.
Troubleshooting Steps
If your job fails with the error referenced above and you suspect overall memory usage on the executor has not exceeded limits then navigate to the job details via the UI Job History page and select "Raw Profile" as highlighted in the image below
Then select the "Error" tab as highlighted below
And review the Error and Verbose Error boxes for the failure stack trace. You may need to scroll down through the stack trace to expose the entire stack trace. If you see the the following "OutOfMemoryException" you are hitting the problem described in this article.
(org.apache.arrow.memory.OutOfMemoryException) Buffer count exceeds maximum. com.dremio.common.memory.DremioRootAllocator$RootAllocatorListener.onPreAllocation():99 org.apache.arrow.memory.BaseAllocator.buffer():260 org.apache.arrow.memory.BaseAllocator.buffer():240 org.apache.arrow.vector.BaseValueVector.allocFixedDataAndValidityBufs():192 org.apache.arrow.vector.BaseFixedWidthVector.allocateBytes():339 org.apache.arrow.vector.BaseFixedWidthVector.allocateNew():309 org.apache.arrow.vector.BaseFixedWidthVector.allocateNew():274 com.dremio.exec.store.AbstractRecordReader.allocate():139 com.dremio.exec.store.FilteringFileCoercionReader.allocate():85 com.dremio.exec.store.parquet.ScanTableFunction.processRow():184 com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():110 com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():209 com.dremio.sabot.driver.StraightPipe.pump():56 com.dremio.sabot.driver.Pipeline.doPump():124 com.dremio.sabot.driver.Pipeline.pumpOnce():114 com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():544 com.dremio.sabot.exec.fragment.FragmentExecutor.run():472 com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700():106 com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():981 com.dremio.sabot.task.AsyncTaskWrapper.run():121 com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():249 com.dremio.sabot.task.slicing.SlicingThread.run():171
An alternative method for identifying the error is by searching the coordinator server.logs for references to the job id that has failed.
Cause
The "Buffer count exceeds maximum" error means that the query exceeded an internal buffer size limit used by Apache Arrow. This is not related to the Direct Memory byte buffers used by the query.
Steps to Resolve
The size of the buffer being used is determined by the following dremio.conf
parameter :
debug.alloc.est_heap_buf_size_bytes: 3200
This parameter is not explicitly defined in the dremio.conf by default. When it is not explicitly defined, the default value is set to 800 bytes.
To explicitly define the parameter simply add it to the bottom of your existing dremio.conf file.
The default value is sufficient for the majority of use-cases but in those cases where Dremio exceeds this value the solution is to increase this size limit. Increase the value to 3200 bytes, monitor for any further recurrence and increase or decrease again if needed.
Setting the value to zero (0) will set it to unlimited.
As this is a dremio.conf parameter ensure that you set it on all nodes and restart all coordinator and executor nodes for the setting to take effect.