Overview
When trying to access HDFS/Hive datasets, one may run into the following type of error; SYSTEM ERROR: HadoopIllegalArgumentException: Invalid buffer, not of length 635708.
This error will be seen in the Dremio UI through failing jobs/profiles.
Other errors, such as the below, which are potential symptoms of this issue, may also be seen in the Dremio server.log
ERROR com.dremio.sabot.driver.SmartOp - RemoteException: File does not exist: /opt/dremio/data/results/1cc16122-3287-a5d0-5e50-f531375fee00/1_32_0.dremarrow1 (inode 4387336033) xxx does not have any open files.
Applies To
This issue would apply to releases prior to Dremio 22.1.1 and to customer environments configured against HDFS/Hive sources.
Details
In addition to other related errors that might be seen for this issue, the following verbose stack trace or similar will be seen in failing job profiles.
SYSTEM ERROR: HadoopIllegalArgumentException: Invalid buffer, not of length 635708 SqlOperatorImpl TABLE_FUNCTION Location 1:0:4 Fragment 1:0 [Error Id: 12ebdfa0-43ca-4089-b6e8-fad5b4759062 on dremio-executor-queries-2.dremio-cluster-pod.dremio.svc.cluster.local:0] (java.util.concurrent.CompletionException) java.io.IOException: org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer, not of length 635708 com.dremio.exec.hadoop.HadoopAsyncByteReader.lambda$readFully$0():64 java.util.concurrent.CompletableFuture$AsyncRun.run():1640 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():750 Caused By (java.io.IOException) org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer, not of length 635708 com.dremio.exec.hadoop.FSDataInputStreamWrapper.read():146 com.dremio.io.FilterFSInputStream.read():69 com.dremio.exec.store.dfs.FSInputStreamWithStatsWrapper.read():93 com.dremio.exec.hadoop.HadoopAsyncByteReader.readFully():108 com.dremio.exec.hadoop.HadoopAsyncByteReader.lambda$readFully$0():60 java.util.concurrent.CompletableFuture$AsyncRun.run():1640 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():750 Caused By (org.apache.hadoop.HadoopIllegalArgumentException) Invalid buffer, not of length 635708 org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers():137 org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>():48 org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode():86 org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode():170 org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer():433 org.apache.hadoop.hdfs.PositionStripeReader.decode():74 org.apache.hadoop.hdfs.StripeReader.readStripe():390 org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange():507 org.apache.hadoop.hdfs.DFSInputStream.pread():1361 org.apache.hadoop.hdfs.DFSInputStream.read():1570 org.apache.hadoop.fs.FSDataInputStream.read():255 sun.reflect.GeneratedMethodAccessor28.invoke():-1 sun.reflect.DelegatingMethodAccessorImpl.invoke():43 java.lang.reflect.Method.invoke():498 com.dremio.exec.hadoop.FSDataInputStreamWrapper.read():140 com.dremio.io.FilterFSInputStream.read():69 com.dremio.exec.store.dfs.FSInputStreamWithStatsWrapper.read():93 com.dremio.exec.hadoop.HadoopAsyncByteReader.readFully():108 com.dremio.exec.hadoop.HadoopAsyncByteReader.lambda$readFully$0():60 java.util.concurrent.CompletableFuture$AsyncRun.run():1640 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():750 SqlOperatorImpl TABLE_FUNCTION Location 1:0:4 Fragment 1:0 com.dremio.exec.hadoop.HadoopAsyncByteReader(HadoopAsyncByteReader.java:64) ...(:0)
Cause
The SYSTEM ERROR: HadoopIllegalArgumentException: Invalid buffer, not of length 635708 was due to the HDFS 3.2.1 client shipped with Dremio 21.2.0 and other versions prior to 22.1.1, containing the bug mentioned in HDFS-14373.
The Dremio shipped HDFS client was upgraded from 3.2.1 to 3.3.2 from Dremio 22.1.1 onwards therefore, the solution here is to upgrade to Dremio to 22.1.1 or above (preferably the latest 22.x version at the time of reviewing this article due to added fixes and features).