Summary
This article addresses issues related to metadata following Iceberg enablement and provides solutions for resolving them.
Reported Issue
After enabling Apache Iceberg for the unlimited splits feature, some users have encountered issues when subsequently upgrading. In such cases the PDS metadata may well have to be rebuilt.
Relevant Versions
Dremio Versions 19.0 onwards.
Troubleshooting Steps
N/A
Cause
Apache Iceberg support can be enabled with the following support keys. (Note in Release 21.0 onwards these are enabled by default)
dremio.iceberg.enabled = true
dremio.execution.support_unlimited_splits = true
Once this feature is enabled, the metadata for physical datasets is stored in a different path and format. There have been some known issues observed when upgrading between major releases (versions 19 onwards) where the metadata can either become corrupted or lost, resulting in a need to rebuild it.
Steps to Resolve
If there are only a handful of datasets then a simple SQL will suffice:
ALTER PDS <name> FORGET METADATA
ALTER PDS <name> REFRESH METADATA
However, if the user has a large number of datasets then the following procedure will help to streamline the process:
1 - Create a Virtual dataset (VDS) somewhere for example in Shared/PDS:
SELECT TABLE_SCHEMA || '.' || TABLE_NAME from INFORMATION_SCHEMA."TABLES" where TABLE_TYPE ='TABLE'
2 - Run the script attached below. Note it's using Shared.pds
as the VDS, that can be changed by altering this line:
SQL_QUERY="select * from Shared.pds"
3 - Use the output in refresh_info.out
generated by the script to then run from a JDBC / OBDC client that allows multiple SQL statements (or if using Dremio 21.0 onwards you can run multiline scripts in the UI).
Tips & Tricks
N/A
Best Practices
N/A
Recommendations
N/A
FAQ
N/A
Additional Resources
Apache Iceberg in Dremio - https://docs.dremio.com/software/data-formats/apache-iceberg/
Metadata refreshing - https://docs.dremio.com/software/advanced-administration/metadata-caching/