Overview
How to change default CTAS type from Iceberg
Applies To
Versions 22.0.0+ of Dremio
Details
As of version 22.0.0, all new CTAS created in dremio on filesystem sources are created in Apache Iceberg format. Iceberg allows for time travel over historical data for PIT (point-in-time) queries, for a full breakdown of why see here.
As part of this realignment, Dremio v22+ allows definition of CTAS format at a data source level in the Advanced Properties for that source:
The options here are either Iceberg, or Parquet (the pre-v22 default format). However, this level of control may not be granular enough for complex workloads, where many different datasets hang off the same source. In the same way, if you wish to change the format of a specific CTAS, this will affect all datasets on that source.
It is possible to control the table format generated at an individual level in your SQL. To do so, explicitly state the required format in the "STORE AS" clause:
CREATE TABLE "MySource"."MyFolder"."My_Parquet_Table"
STORE AS (type =>'parquet')
AS SELECT * FROM "MySource"."MyCollection"."UpdatedFiles";
CREATE TABLE "MySource"."MyFolder"."My_Iceberg_Table"
STORE AS (type =>'iceberg')
AS SELECT * FROM "MySource"."MyCollection"."UpdatedFiles";
Pre v22 versions
STORE AS (type => 'iceberg')
Defining a CTAS format as parquet is valid syntax; however, explicitly defining it with the type "iceberg" when the key is in place will cause an error.