Overview
When running REST API calls to query tables in Dremio, specifically, the call to retrieve job results, only 100 rows are returned when the query should return many more rows.
Applies to
All versions of Dremio
Cause
The API call api/v3/job/${JOB_ID}/results
returns the results in batches and by default it will return with a batch size of 100. (Docs page)
To retrieve all the results from a job, you will need to send multiple calls to the API, changing the offset
parameter each time.
The first call can be without any query parameters, although you might want to increase the batch size to the highest possible (500) e.g.:
/api/v3/job/${JOB_ID}/results?limit=500
Within the JSON response, a field called rowCount
holds the total number of rows in the dataset. This can be used to calculate how many calls need to be made to retrieve the entire dataset, with different offset=
query parameters each time.
Putting it all together in a script, you would end up with something like this (for a bash script):
#!/usr/bin/bash
DREMIO_AUTH_TOKEN=_dremio$(curl $DREMIO_BASE_PATH/apiv2/login -k -H 'Content-Type: application/json' -d"{\"userName\":\"$USERNAME\",\"password\":\"$PASSWORD\"}" -s | jq -r ".token")
DATASET=$(curl -X GET -s -k -H 'Content-Type: application/json' -H "Authorization: $DREMIO_AUTH_TOKEN" $DREMIO_BASE_PATH/api/v3/job/${JOB_ID}/results?limit=${BATCHSIZE} )
## Do something with the first batch of results stored in $DATASET here
# echo ${DATASET} | jq .
ROW_COUNT=$(echo ${DATASET} | jq ."rowCount")
OFFSET=${BATCHSIZE}
while [[ ${OFFSET} -lt ${ROW_COUNT} ]]
do
DATASET=$(curl -X GET -s -k -H 'Content-Type: application/json' -H "Authorization: $DREMIO_AUTH_TOKEN" $DREMIO_BASE_PATH/api/v3/job/${JOB_ID}/results?limit=${BATCHSIZE}\&offset=${OFFSET} )
## Do something with the subsequent batch of results stored in $DATASET here
# echo ${DATASET} | jq .
let OFFSET=${OFFSET}+${BATCHSIZE}
done
echo ${ROW_COUNT}
Further reading
Documentation for the limit and offset query parameters is available here:
https://docs.dremio.com/software/rest-api/#limit-and-offset-query-parameters