Summary
You begin to notice a change in the stability of your Dremio's Coordinator node. Perhaps after recently applying an update to Dremio. Seemingly the Coordinator pod will lose the ability to communicate with the cluster creating outage windows. You notice there is also a restart of a ZooKeeper pod that seems to line up with the timing of the service outage.
Relevant Versions Tools and Integrations
All currently supported Dremio versions that have been deployed via Dremio's Helm Charts.
Details
Upon inspection of the available Dremio and Kubernetes logging, you may see something similar to the following coming from your ZooKeeper instance:
"containerID": "docker://<docker_id>",
"image": "repo-location/release-name:latest",
"imageID": "docker-pullable://repo-location/sha_id",
"lastState": {},
"name": "wait-for-zookeeper",
"ready": true,
"restartCount": 8,
"started": false,
"state": {
"terminated": {
"containerID": "docker://conatainer_id",
"exitCode": 0,
"finishedAt": "2024-10-24T14:13:00Z",
"reason": "Completed",
"startedAt": "2024-10-24T14:13:00Z"
}
}
The "restartCount" above matches the same value on Dremio's Coordinator pod.
You can also see this "exec" command is being passed from Dremio's deployment Helm Charts via the ZooKeeper template:
"exec": {
"command": [
"/bin/bash",
"-c",
"[ \"$(echo ruok | nc 127.0.0.1 2181)\" == \"imok\" ]"
]
},
If you were to review this identified Community Issue, you will see where this command can create the types of problems you are seeing within your environment.
Steps to Resolve
The above is a result of using an older release of Dremio's Helm Charts to deploy a recent version of Dremio. From time to time, Dremio will make improvements to our deployment Charts in an effort to stay ahead of bugs or to further improve performance or ease of use.
It is important to ensure you are using a suitable pairing of Dremio's Helm Charts and the core Dremio service itself in order to provide the most stable foundation for your environment.
If you have any questions regarding your Helm Charts, please reach out to Dremio Support.