Overview
On earlier versions of Dremio with Iceberg enabled for metadata, users may see errors linked to commits to Iceberg
Applies To
Dremio Versions earulier than 21.0
Details
The user may see the following error in the UI
Also in the server.log
on the coordinator we see the following
2022-02-23 16:54:34,040 [e1 - 1de99947-25a7-22fa-79cf-aa7a90167000:frag:0:0] ERROR com.dremio.sabot.driver.SmartOp - NessieReferenceConflictException: Retry-Failure during commit against 'main@a9109d7fc12e1987ba3b73ea46cad8fb01d2f6d526ead500514884f6df1105cb'
com.dremio.common.exceptions.UserException: NessieReferenceConflictException: Retry-Failure during commit against 'main@a9109d7fc12e1987ba3b73ea46cad8fb01d2f6d526ead500514884f6df1105cb'
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:885)
at com.dremio.sabot.driver.SmartOp.contextualize(SmartOp.java:140)
at com.dremio.sabot.driver.SmartOp$SmartSingleInput.noMoreToConsume(SmartOp.java:229)
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:63)
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:111)
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:101)
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:371)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:308)
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600(FragmentExecutor.java:95)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:773)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:120)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:243)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
Caused by: org.apache.iceberg.exceptions.CommitFailedException: Failed to commit operation
at com.dremio.plugins.NessieClientImpl.commitOperation(NessieClientImpl.java:407)
at com.dremio.exec.store.iceberg.nessie.IcebergNessieTableOperations.doCommit(IcebergNessieTableOperations.java:89)
at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:118)
at org.apache.iceberg.BaseTransaction.commitCreateTransaction(BaseTransaction.java:248)
at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:224)
at com.dremio.exec.store.iceberg.model.IcebergBaseCommand.endCreateTableTransaction(IcebergBaseCommand.java:114)
at com.dremio.exec.store.iceberg.model.IcebergTableCreationCommitter.commit(IcebergTableCreationCommitter.java:63)
at com.dremio.exec.store.iceberg.manifestwriter.IcebergCommitOpHelper.commit(IcebergCommitOpHelper.java:209)
at com.dremio.sabot.op.writer.WriterCommitterOperator.noMoreToConsume(WriterCommitterOperator.java:179)
at com.dremio.sabot.driver.SmartOp$SmartSingleInput.noMoreToConsume(SmartOp.java:227)
... 10 common frames omitted
Caused by: org.projectnessie.error.NessieReferenceConflictException: Retry-Failure during commit against 'main@a9109d7fc12e1987ba3b73ea46cad8fb01d2f6d526ead500514884f6df1105cb'
at com.dremio.services.nessie.grpc.client.GrpcExceptionMapper.toNessieException(GrpcExceptionMapper.java:208)
at com.dremio.services.nessie.grpc.client.GrpcExceptionMapper.toNessieConflictException(GrpcExceptionMapper.java:236)
at com.dremio.services.nessie.grpc.client.GrpcExceptionMapper.handle(GrpcExceptionMapper.java:142)
at com.dremio.services.nessie.grpc.client.v1api.GrpcCommitMultipleOperations.commit(GrpcCommitMultipleOperations.java:78)
at com.dremio.plugins.NessieClientImpl.commitOperation(NessieClientImpl.java:405)
... 19 common frames omitted
Cause
When the underlying metadata in Dremio is managed by Iceberg, updates and changes are made by committing new versions to Iceberg. If client sessions (UI users, ODBC tools etc) have outdated versions of the Iceberg schema then these errors can occur. Internal jira DX-44953 covers this scenario and the work done to correct it.
Workaround
In earlier Dremio versions up to and including 20.0 users can implement the following support key by increasing the following nessie.kvversionstore.max_retries
from a default of 10 to a higher value
Solution
Upgrade to Dremio version 21.0 onwards. After this version the support key above is superseded by nessie.kvversionstore.commit_timeout_ms