Overview
This article outlines the kind of errors you might see where the Conduit port is blocked by a security group when using AWSE type deployments, or a network / firewall rule with On-Prem deployments. After upgrading to v21 and beyond with the default for metadata being changed to iceberg the effect of this can mean reflections will not build
Applies To
Dremio Version 21.x onwards
Details
The cluster may appear to be up and running but the following error might be seen in the executor log file when trying to build a reflection:
2022-08-24 09:08:04,529 [e6 - 1cfa15d8-5037-88c5-3302-7d1f60233700:frag:0:0] ERROR com.dremio.sabot.driver.SmartOp - ConnectTimeoutException: connection timed out: /10.134.16.37:35317 com.dremio.common.exceptions.UserException: ConnectTimeoutException: connection timed out: /10.134.16.37:35317 at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:890) at com.dremio.sabot.driver.SmartOp.contextualize(SmartOp.java:145) at com.dremio.sabot.driver.SmartOp$SmartSingleInput.noMoreToConsume(SmartOp.java:235) at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:63) at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:111) at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:101) at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:418) at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:355) at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600(FragmentExecutor.java:97) at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:820) at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:120) at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:247) at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171) Caused by: java.lang.RuntimeException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception at com.dremio.plugins.NessieClientImpl.getDefaultBranch(NessieClientImpl.java:120) at com.dremio.exec.store.iceberg.nessie.IcebergNessieTableOperations.getDefaultBranch(IcebergNessieTableOperations.java:129) at com.dremio.exec.store.iceberg.nessie.IcebergNessieTableOperations.doRefresh(IcebergNessieTableOperations.java:70) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:95) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:78) at com.dremio.exec.store.iceberg.model.IcebergBaseCommand.beginCreateTableTransaction(IcebergBaseCommand.java:119) at com.dremio.exec.store.iceberg.model.IcebergTableCreationCommitter.<init>(IcebergTableCreationCommitter.java:55) at com.dremio.exec.store.iceberg.model.FullMetadataRefreshCommitter.<init>(FullMetadataRefreshCommitter.java:68) at com.dremio.exec.store.iceberg.model.IcebergBaseModel.getFullMetadataRefreshCommitter(IcebergBaseModel.java:84) at com.dremio.exec.store.iceberg.manifestwriter.SchemaDiscoveryIcebergCommitOpHelper.initializeIcebergOpCommitter(SchemaDiscoveryIcebergCommitOpHelper.java:151) at com.dremio.exec.store.iceberg.manifestwriter.SchemaDiscoveryIcebergCommitOpHelper.commit(SchemaDiscoveryIcebergCommitOpHelper.java:126) at com.dremio.sabot.op.writer.WriterCommitterOperator.noMoreToConsume(WriterCommitterOperator.java:181) at com.dremio.sabot.driver.SmartOp$SmartSingleInput.noMoreToConsume(SmartOp.java:233) ... 10 common frames omitted Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) at com.dremio.services.nessie.grpc.api.TreeServiceGrpc$TreeServiceBlockingStub.getDefaultBranch(TreeServiceGrpc.java:708) at com.dremio.services.nessie.grpc.client.v1api.GrpcApiV1Impl.lambda$getDefaultBranch$0(GrpcApiV1Impl.java:109) at com.dremio.services.nessie.grpc.client.GrpcExceptionMapper.handleNessieNotFoundEx(GrpcExceptionMapper.java:179) at com.dremio.services.nessie.grpc.client.v1api.GrpcApiV1Impl.getDefaultBranch(GrpcApiV1Impl.java:106) at com.dremio.plugins.NessieClientImpl.lambda$getDefaultBranch$0(NessieClientImpl.java:112) at com.dremio.context.RequestContext.call(RequestContext.java:113) at com.dremio.plugins.NessieClientImpl.getDefaultBranch(NessieClientImpl.java:111) ... 22 common frames omitted Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.134.16.37:35317 at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:613) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)
Cause
By default the conduit port in Dremio uses an ephemeral port ( >1024 ). In some cases customers will tighten down their security groups in AWS from the default normally deployed, this can mean that only certain ports are allowed between nodes therefore blocking the gRPC comms between nodes which is used by Nessie for internode comms.
Solution
The customer can manually set a static port for for conduit for all nodes by changing the following parameter
services.conduit.port
This will require a process restart to take effect