Overview
This error can occur when reading datasets backed by parquet files
Applies To
Dremio releases up to and including 18.x
Details
When running the query, the user will see an error in the job profile that may show the following
SYSTEM ERROR: IndexOutOfBoundsException: index (-91) must not be negative
Digging into the profile, the full error stack will usually look like this example
SYSTEM ERROR: IndexOutOfBoundsException: index (-91) must not be negative SqlOperatorImpl PARQUET_ROW_GROUP_SCAN Location 1:103:6 Fragment 1:0 [Error Id: 010666a5-83ea-4694-b5c8-64dc90821edc on dremio-executor-6.dremio-cluster-pod.default.svc.cluster.local:0] (java.util.concurrent.CompletionException) java.lang.IndexOutOfBoundsException: index (-91) must not be negative java.util.concurrent.CompletableFuture.encodeThrowable():292 java.util.concurrent.CompletableFuture.completeThrowable():308 java.util.concurrent.CompletableFuture.uniWhenComplete():783 java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire():750 java.util.concurrent.CompletableFuture.postComplete():488 java.util.concurrent.CompletableFuture$AsyncRun.run():1646 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 Caused By (java.lang.IndexOutOfBoundsException) index (-91) must not be negative org.apache.arrow.util.Preconditions.checkPositionIndex():1236 org.apache.arrow.util.Preconditions.checkPositionIndex():1218 org.apache.arrow.memory.ArrowBuf.slice():197 io.netty.buffer.NettyArrowBuf.slice():173 com.dremio.parquet.pages.async.RowGroupReader.initializeOffsetIndexes():534 com.dremio.parquet.pages.async.RowGroupReader.lambda$readColumnAndOffsetIndices$6():307 java.util.concurrent.CompletableFuture.uniWhenComplete():774 java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire():750 java.util.concurrent.CompletableFuture.postComplete():488 java.util.concurrent.CompletableFuture$AsyncRun.run():1646 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 SqlOperatorImpl PARQUET_ROW_GROUP_SCAN Location 1:103:6 Fragment 1:0 ...(:0)
Cause
In some cases, parquet files may have column indexes that are null but the offset indexes are still a positive value. This can result in a negative number which in turn causes this error.
Internal jira DX-29726 addresses this problem, which adds a check for this condition.
Workaround
Disable the following support key as a workaround
store.parquet.read_column_indexes
Solution
This issue is fixed with an upgrade to Dremio versions 19.0 onwards.
Further Reading
Dremio support key settings:
https://docs.dremio.com/advanced-administration/support-settings/#support-keys
The following page gives a good overview of the purpose of column and offset indexes:
https://github.com/apache/parquet-format/blob/master/PageIndex.md