Summary
When running SELECT * queries in the Dremio UI query editor, the result set is limited to around 1 million records. This is to prevent consuming excessive disk space and ensure reasonable performance. SELECT COUNT(*) queries are not limited and will return the accurate row count.
Reported Issue
The number of records returned by SELECT * is lower than the count returned by SELECT COUNT(*) for the same query when run from the Dremio UI query editor.
Relevant Versions
Dremio software v11 and higher and Dremio Cloud.
Troubleshooting Steps
Run SELECT COUNT(*) to get the accurate row count.
Cause
Let’s take a look at the the TPC-DS dataset store_sales from Dremio’s “Samples” source. Running the following query from the UI returns 1,046,905 rows, as indicated by the "Records" value in the lower right hand corner for the screenshot:
In this newer version of Dremio, we are already warned that the query is returning a subset of the full results (older versions did not have the warning). I know that this dataset includes over 2.8 billion store sales and it seems unrealistic that only a million of them were unprofitable. When I run the COUNT (*) statement with the same filter, the result is makes more sense:
Over 2 billion store sales lost money!
How was the output limit applied? Why is it not a crisp 1 million?
The raw query profile for the SELECT * statement is instructive. Here is the “Visualized Plan”:
At first glance, it appears to show a single pipeline of operators. In fact, the colors and the first number in the operator names indicate that the query is broken into phases. Within each phase, the query plan is further divided into a number of copies, called threads, or fragments. At runtime, these fragments are distributed across the executor nodes in your Dremio cluster and given batches of records to process. The Dremio query planner may determine that some phases only get one fragment/thread, while others it may parallelize into many more (up to some system limit).
In our SELECT * query, we find that phase 00 and 02 only have a single thread, while phase 01 has 12:
One of the operators in the phase 01 is labeled “Writer”. The “Operators” section indicates, more specifically, that it is an Arrow Writer. This is the operator which writes the results of the query to disk in Apache Arrow format.
We can see that there are 12 copies of this operator writing records across threads on our 2 Dremio executor nodes (10.0.0.148 and 10.0.0.110). The 1 million record limit is divided across all 12 threads doing this write operation, so we would expect to see about 83k records processed by each: 1,000,000 / 12 = 83,333. Instead we see close to this: each one is processing about 87k.
This is because the operators process batches of records and do not stop until they have processed an entire batch. Though each fragment may register that it had completed writing it’s 83k share of the records, apparently this happened in the middle of a batch, and extra records had to be written to finish up.
Consider what happens in the COUNT(*). There is only 1 record in the result set, so the 1 million limit will never be reached and the pipeline of operators in the query will process records batch-wise until they have consumed all the rows in the dataset. This ensures that you get the true record count. Indeed, this would be the case with any query where the result set is under 1 million records.
Steps to Resolve
To get the full result set, use the ODBC, JDBC or Arrow Flight clients which do not have this limit.
Alternatively, adjust the planner.output_limit_size support key, but it is not advised to make this much larger for the reasons described at the top of this article.