Summary
Dremio installations that user S3 (or S3-compatible solutions) for the distributed storage require the path for fs.s3a.buffer.dir to be accessible to the Linux user running the Dremio process on each executor node. If this is not the case, uploads to the users' home spaces and other background process, like metadata refresh, will fail with error messages containing "Could not find any valid local directory for s3ablock-0001-".
Reported Issue
When attempting to upload a file to a file to their Dremio home space, a user encounters: "Unexpected error occurred."
Relevant Versions
All documented Dremio software releases.
Troubleshooting Steps
1. Determine whether or not your Dremio environment uses S3, or an S3-compatible solution, for the Dremio distributed storage. A quick way to check is to examine the dremio.conf configuration file. If the dist path has dremioS3 as the file scheme, than it is using an S3, for example:
paths: {
...
# the distributed path Dremio data including job results, downloads, uploads, etc
dist: "dremioS3:///bucket-for-dremio/dremio-dist"
...
}
2. Check the coordinator node's application log (server.log) for an error similar to the following:
2025-01-11 23:04:00,617 [Fabric-RPC-Offload12] INFO c.d.exec.work.foreman.AttemptManager - 187d0720-6afa-5abb-d92e-4d8f7f70c900: State change requested RUNNING --> FAILED, Exception com.dremio.common.exceptions.UserRemoteException: SYSTEM ERROR: DiskErrorException: Could not find any valid local directory for s3ablock-0001- with requested size 67108864 as the max capacity in any directory is 51899486208
SqlOperatorImpl ARROW_WRITER
Location 0:0:3
SqlOperatorImpl ARROW_WRITER
Location 0:0:3
ErrorOrigin: EXECUTOR
[Error Id: 7ee3ec4c-a253-49ac-8d1e-0b2dd4bdfd64 on ip-172-31-2-3.us-west-2.compute.internal:0]
(org.apache.hadoop.util.DiskChecker.DiskErrorException) Could not find any valid local directory for s3ablock-0001- with requested size 67108864 as the max capacity in any directory is 51899486208
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite():491
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite():166
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite():147
org.apache.hadoop.fs.s3a.S3AFileSystem.createTmpFileForWrite():1377
org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory.create():823
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.createBlockIfNeeded():235
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.<init>():217
org.apache.hadoop.fs.s3a.S3AFileSystem.innerCreateFile():1899
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$create$7():1798
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration():547
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5():528
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration():449
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan():2491
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan():2510
org.apache.hadoop.fs.s3a.S3AFileSystem.create():1797
com.dremio.plugins.util.ContainerFileSystem.create():365
org.apache.hadoop.fs.FileSystem.create():1233
org.apache.hadoop.fs.FileSystem.create():1210
org.apache.hadoop.fs.FileSystem.create():1091
org.apache.hadoop.fs.FileSystem.create():1078
com.dremio.exec.hadoop.HadoopFileSystem.create():248
com.dremio.io.file.FilterFileSystem.create():64
com.dremio.exec.store.dfs.LoggedFileSystem.create():96
com.dremio.exec.store.easy.arrow.ArrowRecordWriter.setup():124
com.dremio.sabot.op.writer.WriterOperator.setup():127
com.dremio.sabot.driver.SmartOp$SmartSingleInput.setup():327
com.dremio.sabot.driver.Pipe$SetupVisitor.visitSingleInput():83
com.dremio.sabot.driver.Pipe$SetupVisitor.visitSingleInput():70
com.dremio.sabot.driver.SmartOp$SmartSingleInput.accept():272
com.dremio.sabot.driver.StraightPipe.setup():101
com.dremio.sabot.driver.StraightPipe.setup():100
com.dremio.sabot.driver.StraightPipe.setup():100
com.dremio.sabot.driver.StraightPipe.setup():100
com.dremio.sabot.driver.Pipeline.setup():79
com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution():770
com.dremio.sabot.exec.fragment.FragmentExecutor.run():549
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1274
com.dremio.sabot.task.AsyncTaskWrapper.run():130
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():281
com.dremio.sabot.task.slicing.SlicingThread.run():186
Dremio
Cause
Dremio uses the the Apache Hadoop-AWS module to read and write files to S3. During file uploads (and other write actions), this code uses the local storage attached to the executor(s) to temporarily stage the data before ingestion into the S3. This temporary storage uses the path specified in the Apache Hadoop cores-site parameter fs.s3a.buffer.dir. If the user running the Dremio process on a node does not have access to this path, then uploads will fail with the above error.
Steps to Resolve
1. Determine the path specified for fs.s3a.buffer.dir. If it is not explicitly set in your core-site.xml file, then it's default value takes the form $/tmp/hadoop-${user.name}/s3a where user.name is the user running the Dremio process on the executor(s).
2. Change the owner or permissions of this directory so that the user running Dremio can read and write to it. This process may need to be repeated for multiple executors.
As an example of the problem, here is a Dremio executor running on and AWS EC2 instance:
[ec2-user@ip-172-31-2-3 ~]$ ps -ef | grep Dremio
dremio 2458 1 1 20:55 ? 00:01:22 /usr/lib/jvm/jdk-11.0.24-oracle-x64/bin/java -Djava.util.logging.config.class=org.slf4j.bridge.SLF4JBridgeHandler -Djava.library.path=/opt/dremio/lib/x86_64 --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED -Xlog:gc*:file=/var/log/dremio/server.gc:time,uptime,tags,level -Ddremio.log.path=/var/log/dremio -Ddremio.plugins.path=/opt/dremio/plugins -Xmx4096m -XX:MaxDirectMemorySize=8192m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dremio -Dio.netty.maxDirectMemory=-1 -Dio.netty.tryReflectionSetAccessible=true -XX:+UseG1GC -cp /opt/dremio/conf:/opt/dremio/jars/*:/opt/dremio/jars/ext/*:/opt/dremio/jars/3rdparty/* com.dremio.dac.daemon.DremioDaemon
The Linux user running Dremio (the Dremio service user) is simply named "dremio". However, if we list the default path /tmp/hadoop-${user.name} for this user, we see that the s3a subdirectory is owned by "ec2-user":
[ec2-user@ip-172-31-2-3 ~]$ ls -l /tmp/hadoop-dremio/
total 0
drwxr-xr-x. 2 ec2-user ec2-user 6 Jan 11 21:00 s3a
Changing the owner to "dremio" is sufficient to solve the problem:
[ec2-user@ip-172-31-2-3 ~]$ sudo chown dremio:dremio /tmp/hadoop-dremio/s3a
[ec2-user@ip-172-31-2-3 ~]$ ls -l /tmp/hadoop-dremio
total 0
drwxr-xr-x. 2 dremio dremio 6 Jan 11 21:00 s3a
Additional Resources
Apache Hadoop core-site default reference