Summary
This document provides guidance and a script to help automate the metadata refresh of all physical datasets.
Reported Issue
There may be a time when you loose the distributed storage that holds the metadata or you restore your cluster to another environment and need to recreate the metadata for all your PDSs.
Overview
For what ever reason, if you have a need to forget and refresh metadata for all PDSs (tables) then the following script will help facilitate in the generation of all the SQL commands needed. if yoiu have hundreds or thousands of PDSs, this is not a trivial task.
Relevant Versions Tools and Integrations
Dremio Versions 19.0 onwards.
Steps to Resolve
The attached script will help generate all the FORGET and REFRESH commands needed.
Example usage:
$ ./forget_refresh.sh
Username [dremio]?
Password or Personal Access Token (hit return to use the default)?
Dremio base path [http://localhost:9047]?
Please enter a space in which this script will create the VDS 'pds_list' [space1]?
Token: _dremios0dsexxxxxxxxeiuub56664
Waiting 5 seconds for query to complete
Output in refresh_command.sql
$ cat refresh_command.sql
ALTER PDS "Samples"."samples"."dremio"."com"."Dremio%20University"."restaurant_reviews"."parquet" FORGET METADATA;
ALTER PDS "Samples"."samples"."dremio"."com"."Dremio%20University"."restaurant_reviews"."parquet" REFRESH METADATA;
ALTER PDS "Samples"."samples"."dremio"."com"."NYC-taxi-trips" FORGET METADATA;
ALTER PDS "Samples"."samples"."dremio"."com"."NYC-taxi-trips" REFRESH METADATA;
ALTER PDS "Samples"."samples"."dremio"."com"."SF%20weather%202018-2019"."csv" FORGET METADATA;
ALTER PDS "Samples"."samples"."dremio"."com"."SF%20weather%202018-2019"."csv" REFRESH METADATA;
This core of this script was based on the script found in this article, Metadata file errors after upgrading and enabling Apache Iceberg features. Credit to that author.
See attached.