Summary/Reported Issue
An engine may get stuck in a STOPPING
state and become unresponsive, preventing further actions such as restart, modification, or deletion via the Dremio UI. Attempts to manage the engine using the UI may result in no response or HTTP 500 errors if attempted via the REST API.
Relevant Versions
All Versions
Troubleshooting Steps
This behavior is commonly observed when:
- The engine did not stop gracefully due to an error in the underlying system.
- The engine configuration is incorrect or missing critical runtime properties.
- A network or internal communication failure interrupted the shutdown sequence.
When an engine is in this hung state, it does not respond to UI commands, and must be handled using REST API calls.
Cause
The issue is typically caused by an incomplete or interrupted engine shutdown in any deployment. This results in the engine’s state being marked as STOPPING
indefinitely, with no backend process active to complete the transition.
Mostly seen with YARN-based deployments.
Steps to Resolve
To manually stop or delete an engine stuck in STOPPING
, follow these steps using cURL or a tool like Postman.
1. Authenticate and Retrieve a Token
Use the following command to log in and get an authorization token:
curl -X POST http://<dremio-host>:9047/apiv2/login \
-H "Content-Type: application/json" \
-d '{ "userName": "<username>", "password": "<password>" }'
- Replace
<dremio-host>
,userName
, andpassword
with values for your environment. - Copy the returned token for use in subsequent requests.
2. Fetch Engine Details
Use the token to retrieve all engine (cluster) configurations:
curl -X GET http://<dremio-host>:9047/apiv2/provision/clusters \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: _dremio <your_token_here>"
Locate the stuck engine in the JSON response.
3. Attempt to Stop the Engine
Create a modified version of the engine’s JSON object by:
- Removing the
currentState
andstateChangeTime
fields. - Changing
"desiredState"
to"STOPPED"
.
Then send a PUT request like the following:
curl -X PUT http://<dremio-host>:9047/apiv2/provision/cluster/<engine_id> \
-H "Content-Type: application/json" \
-H "Authorization: _dremio <your_token_here>" \
-d '{
"id": "5f01f683-7xxxxxxxx4e29899fc",
"tag": "KEj9A/vxxxx",
"clusterType": "YARN",
"name": "test-api-engine",
"dynamicConfig": {
"containerCount": 1
},
"desiredState": "STOPPED",
"yarnProps": {
"memoryMB": 16384,
"virtualCoreCount": 4,
"subPropertyList": [
{
"key": "yarn.resourcemanager.hostname",
"value": "172.xx.x.144",
"type": "JAVA_PROP"
}
],
"distroType": "OTHER",
"isSecure": false
},
"shutdownInterval": 7200000,
"isAllowAutoStart": false,
"isAllowAutoStop": false
}'
ℹ️ It may take several seconds or minutes for the engine to respond.
4. Delete the Engine (If Stopping Fails)
If the STOPPED
request does not work or the engine remains in a hung state, delete it directly using its ID:
curl -X DELETE http://<dremio-host>:9047/apiv2/provision/cluster/<engine_id> \
-H "Authorization: _dremio <your_token_here>"
Next Steps
- After deletion or stopping, verify that the engine no longer appears in the UI or API.
- If managing multiple engines in YARN, ensure resource manager connectivity is stable and configuration values (e.g., memory, vCores) are valid.
- Consider monitoring engine state transitions with automated alerts to detect hangs early.
Additional Resources
https://docs.dremio.com/current/reference/api/