Summary
When creating an Iceberg table using CREATE TABLE AS (CTAS) and PARTITION BY() the job can fail with the following error
SYSTEM ERROR: IllegalStateException: Record count not set for this vector container
SqlOperatorImpl EXTERNAL_SORT
Location 3:48:4
Fragment 3:0
[Error Id: 7e75a871-5605-4d94-918a-9644e38b8817 on 10.0.0.1:0]
(java.lang.IllegalStateException) Record count not set for this vector container
com.google.common.base.Preconditions.checkState():502
com.dremio.exec.record.VectorContainer.getRecordCount():383
com.dremio.sabot.op.sort.external.MemoryRun.addBatch():152
com.dremio.sabot.op.sort.external.ExternalSortOperator.consumeData():308
com.dremio.sabot.driver.SmartOp$SmartSingleInput.consumeData():267
com.dremio.sabot.driver.StraightPipe.pump():59
com.dremio.sabot.driver.Pipeline.doPump():124
com.dremio.sabot.driver.Pipeline.pumpOnce():114
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():560
com.dremio.sabot.exec.fragment.FragmentExecutor.run():478
com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700():108
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1006
com.dremio.sabot.task.AsyncTaskWrapper.run():122
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():249
com.dremio.sabot.task.slicing.SlicingThread.run():171
Reported Issue
A CTAS job using PARTITION BY () as per the example below can trigger this error.
CREATE TABLE "mySource"."Folder1"."my_table" PARTITION BY (fieldname) AS
SELECT * FROM "mySource"."Folder2"."my_partitioned_table"
Relevant Versions
This article is relevant to v24.x
Cause
This error is caused by a known defect ( DX-86203 )
Steps to Resolve
Upgrade to Dremio v24.2.11 ( and above ) or v24.3.0 ( and above ) to overcome this defect.
To workaround the defect prior to upgrade it is possible to first create an empty data, add a partition column and then insert the data.
For example
-- Create an empty table
CREATE TABLE "mySource"."Folder1"."my_table"" AS
SELECT * FROM "mySource"."Folder2"."my_source_table" WHERE 1 = 2;
-- Add a partition column
ALTER TABLE "mySource"."Folder1"."my_table" ADD PARTITION FIELD field1;
-- Insert data
INSERT INTO "mySource"."Folder1"."my_table" (SELECT* FROM "mySource"."Folder2"."my_source_table" )