Basic steps for troubleshooting common Dremio Kubernetes issues.
Any Dremio release. NOTE: all steps below assume the Helm chart being used is the dremio_v2 chart.
There are many moving parts in a Kubernetes cluster, and identifying what is causing a deployment to fail can be problematic. Below are some common situations encountered, and a few suggestions which may help.
Most customers deploy a customised Helm chart which allows for configuration of various aspects of the cluster (see this document for details). It's important to persist the Dremio application logs, including server log (server.log), garbage collection logs (server.gc), metadata refresh logs (metadata-refresh.log) and query logs (queries.json) on a PV to allow for their collection after pod restarts. Dremio Support will require these for any troubleshooting activity.
Some customers prefer to route all pod logs to STDOUT for collection by an external collector, i.e. Logstash or Elasticsearch. It is possible to route logs to STDOUT, local disk, or both. This is controlled by the logback.xml.
$ kubectl get pods -n <namespace>
Verify all pods are up and running.
All pods pending
$ kubectl describe pod dremio-master-0 -n <namespace>
Check the final event message:
If you see this, you need to define your Docker secret in that namespace. Note that a secret is namespace specific, so if you deploy in a different namespace you will need to define the secret in that namespace.
1 node(s) didn't find available persistent volumes to bind
Check your storage definitions. Have you defined a customer Storage Class in values.yaml? Storage classes may be vendor- or site-specific. Ensure you're using a valid storage class by calling
$ kubectl get sc
Some pods pending
Check you have sufficient resources allocated for those pods. Note that you should have sufficient nodes to schedule 1 executor pod per node, and 1 coordinator on a dedicated node. The nodes should have sufficient memory for the resource requested by the pods.
Executors start, but dremio-master is in Init:CrashLoopBackOff
$ kubectl describe statefulset dremio-master
Check the status of the set in events. Do you see the following message?
create Pod dremio-master-0 in StatefulSet dremio-master successful
If so, the resources have been allocated successfully.
$ kubectl get pvc
Check the age of the persistent volume claims. If the cluster is new but the PVCs are older than initialisation time, it could be that your new cluster has picked up a deleted older clusters disk allocations. When clusters are deleted, their PVCs are not removed. When a new cluster is initialised, if kubernetes finds free disk claims matching what it requires, it will allocate those claims to the new cluster. The executors will initialise, but when the master tries to start, if metadata is found on the PVC that doesn't match the dremio version of the new cluster, the pod will not start.
Executors fail to start with CrashLoopBackOff
$ kubectl logs -f dremio-executor-X
This should give you some indication of the underlying issue here - for example, incorrect VM heap sizings result in:
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Master failing to come up on restart, stuck on init
We have seen issues where a master pod will fail to come up on restart, stuck on init phase:
NAME READY STATUS RESTARTS AGE
dremio-executor-2 1/1 Running 15 (20m ago) 22h
dremio-executor-1 1/1 Running 15 (20m ago) 22h
dremio-executor-0 1/1 Running 15 (20m ago) 22h
dremio-master-0 0/1 Init:1/4 0 6m32s
Running a describe on the master shows the state:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Started 38s kubelet Started container start-only-one-dremio-master
Normal Pulling 37s kubelet Pulling image "busybox"
Normal Pulled 36s kubelet Successfully pulled image "busybox" in 1.013387208s
Normal Created 36s kubelet Created container wait-for-zookeeper
Normal Started 36s kubelet Started container wait-for-zookeeper
This can occur in certain situations, i.e. when Zookeeper loses its quorum and one or more ZK pods thinks it's the leader.
To resolve, edit the statefulset zk, and scale the zookeeper pods to 0. When all zookeeper pods are removed, scale back to the required amount to reinitialise the pods and force a leader election. The master should initialise.
It's also possible the ZK transaction logs may have filled the ZK pod disk. If this issue occurs, purge the transaction logs using the internal ZK tool zkCleanup.sh.
Executors being killed
It can be hard to detect executors being reaped as this takes place at a worker node level. It appears at a coordinator level that the executor may just be busy or unresponsive. Some symptoms of this are the following errors:
Protocol not registered with Fabric
This daemon doesn't support execution operations
ChannelClosedException: [FABRIC]: Channel closed
If you see a combination of these errors reported, check the system logs on your worker nodes. These will report if the pods are being reaped due to exceeding memory allocations.
By default Kubernetes will overcommit memory to executors as the memory usage grows, for example with large metadata refreshes or intensive queries. The heap sizes by default in our helm charts are in megabytes (M) whereas other settings may be in mebibytes (MiB). It is important to understand the distinction so you don't overallocate heap or direct memory when initialising the pods.
If your pods are being reaped, you should review the server logs on your worker nodes for any OOM killer messages.
Reviewing worker node logs (AKS)
Note that AKS worker node logs are not retailed. To view /var/log/messages on AKS worker nodes requires launching a special DEBUG pod:
- Stand up debug pod on XY worker node: https://learn.microsoft.com/en-us/azure/aks/node-access
- View logs: https://learn.microsoft.com/en-us/azure/aks/kubelet-logs
It is possible to perform this action and retrieve the worker node logs using a loop, for example:
$ kubectl get no | grep aks | while read NODE_NAME null ; do
kubectl debug node/$NODE_NAME -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 -- bash -c "cat /host/var/log/messages" >> kubelet.$NODE_NAME.log
Note that you will need to manually delete the debug pods afterward.
Executors sitting in pending when restarting (EKS)
If your EKS worker nodes are spread across multiple availability zones (AZs), the executors will be initially deployed across each availability zone. If using EBS filestore, that executor pod will be allocated a PVC tied to a specific AZ. EBS volumes are local to a PVC. If the executor fails to restart, ensure the pod has not been allocated a worker node in a different AZ - the PVC allocation will persist to that pod, but it will be unable to mount the disk in an alternate AZ, so will fail to start (see here).
To resolve you must delete the PVC and pod, this will launch the pod cleanly and allocate PVC in the same AZ. Note that you will lose cache data by deleting the PVC - you will need to delete both the primary and cache PVCs, for example:
dremio-default-executor-volume-dremio-executor-0 Bound pvc-6c093400-e4e5-480d-9b86-93a664708a73 25Gi
dremio-default-executor-c3-0-dremio-executor-0 Bound pvc-9eef66a6-61ba-434c-8843-40d9bf806b84 10Gi
queries.json not being generated on coordinator
The queries.json file is extremely helpful for understanding workloads, failures and general behaviour of the cluster. The format for logback.xml differs on Kubernetes from standard configurations. If your queries.json is failing to capture job history:
- Check you're not using an old version (ie pre-v20) of logback. You can compare to the latest template here
- Ensure you are setting the Dremio log path - this should be directed at your co-ordinator PVC location to persist the logs. Set in the extraStartParams in your coordinator manifest, for example:
It is possible to apply this by editing the dremio-master StatefulSet rather than a full Helm upgrade, however be aware the dremio-master-0 pod will restart.
Backups and Restores
It goes without saying that the coordinator should be backed up on a regular (i.e daily) basis, with the backup location persisted on a dedicated PVC, and ideally then copied to an external (e.g. S3) location. The backup script needs to be robust and include error capture - we see issues in Dremio Support where backup scripts have no exit status capture, resulting in lost clusters.
You can find details on configuring Kubernetes cron jobs here.
To restore a cluster on Kubernetes, the process is relatively straightforward. If restoring to the same version, you would simply:
- Switch the cluster to admin mode to shutdown the active pods
- Copy the backup file to the admin pod
- Delete your existing data directory
- Restore and relaunch the master and executor pods
If rolling back versions, the recommended process would be to launch a vanilla cluster at the required version, copy across your backup which must be at the matching version, restore and relaunch
When running an upgrade or rollback, at any point you can query the status by calling
$ kubectl logs dremio-master-0 -c upgrade-task
This will show the logs of the upgrade-task sidecar launched as part of the process, visible by describing pod dremio-master-0:
Normal Created 27s (x4 over 3m47s) kubelet Created container upgrade-task
Any problems will be shown here, for example a mismatch in db (i.e. KVStore) and binary release level:
KVStore version is 23.1.0-202211250136090978-a79618c7Upgrade failed
java.lang.IllegalStateException: Downgrading from version Version; 23.1.0-202211250136090978-a79618c7 to Version; 22.1.2-202209300521100619-6d0ea58b is not supported
When running the admin pod, any restore output is sent to console on the admin pod. To retrieve the logs for the restore, logs should be output on the dremio-admin pod to /var/log/dremio. You can also check the status of any KVStore log replay at /opt/dremio/data/db/catalog/LOG.
Also, bear in mind that if you have SSO configured, unless you are using sticky SSO IP addresses, you will have to update your callback URI to reflect the new address if you are using IPs rather than DNS aliases, as the service address will not persist.
Diagnostic log collection with DDC
Collecting logs to provide to Dremio Support in case of issue can be a manual task. Dremio has a diagnostic collector tool which can be run against dremio kubernetes clusters to collect logs and configuration for Support. The tool can be found here. You should provide the namespace and labels of your pods in the call. To find these, run:
$ kubectl get pods -n <namespace> --show-labels
NAME READY STATUS RESTARTS AGE LABELS
zk-0 1/1 Running 34 (3m40s ago) 51d app=zk,controller-revision-hash=zk-7f47cd6968,statefulset.kubernetes.io/pod-name=zk-0
dremio-executor-2 1/1 Running 13 (4m50s ago) 22h app=dremio-executor,controller-revision-hash=dremio-executor-86bc4f7796,role=dremio-cluster-pod,statefulset.kubernetes.io/pod-name=dremio-executor-2
dremio-executor-1 1/1 Running 13 (4m50s ago) 22h app=dremio-executor,controller-revision-hash=dremio-executor-86bc4f7796,role=dremio-cluster-pod,statefulset.kubernetes.io/pod-name=dremio-executor-1
dremio-executor-0 1/1 Running 13 (4m50s ago) 22h app=dremio-executor,controller-revision-hash=dremio-executor-86bc4f7796,role=dremio-cluster-pod,statefulset.kubernetes.io/pod-name=dremio-executor-0
For example, to collect both executor and coordinator in the default namespace, make the call like so:
$ ddc -k -e default:app=dremio-executor -c default:app=dremio-coordinator -a 3 -o /tmp/k8s-diag.zip
Please include the created k8s-diag.zip with your Dremio Support case.
Other points of note
We see issues where customers have initially sized their PVC allocations on pods to be quite large, and subsequently want to shrink those volumes. This operation is unsupported by Kubernetes, due to the nature of volume storage.
AWS IAM roles for EKS Service Accounts
It is possible when configuring clusters on EKS to link your IAM role to a Kubernetes service account (discussed here). However, note that at this time this functionality is not supported in the Dremio helm charts.