Running tcpdump on Kubernetes – Dremio Support

Summary

This article will guide you through how to:

1. Add a tcpdump sidecar to a dremio cluster statefulset for troubleshooting purposes

2. Run an ephemeral debug container for on-the-fly troubleshooting

Reported Issue

When network communications between a pod and a source fail for example, or inter-pod communication becomes an issue it is useful to analyse pod traffic using tcpdump, which is not installed by default on the dremio container build.

Relevant Versions

All versions.

Steps to Resolve

Note: The following procedure is intrusive in that it requires a restart of all pods in a statefulset so can impact workload.

For the first example we will add the tcpdump container to executors using a rollout method.

1. Confirm all pods are operating as expected:

$ kubectl get pods -n <namespace>

2. Generate a patch file to create the tcpdump sidecar:

$ cat <<EOF >patch.yaml
spec:
template:
spec:
containers:
- name: tcpdump
image: docker.io/dockersec/tcpdump
EOF

3. Apply the patch file to the relevant statefulset:

$ kubectl patch statefulset dremio-executor -n <namespace> --patch "$(cat patch.yaml)"
$ kubectl rollout status statefulset/dremio-executor -n <namespace>
partitioned roll out complete: 2 new pods have been updated...

You will see each executor pod restart in turn. After the restart, run a describe against a pod to confirm the container has been added:

$ kubectl describe pod dremio-executor-0 -n <namespace> | more

Containers:
  tcpdumper:
    Container ID: containerd://3fba9fa94fd08c05a94c0a5df69767b71ccad89599cb0a4c99499e70fd7062f9
    Image: docker.io/dockersec/tcpdump
    Image ID: docker.io/dockersec/tcpdump@sha256:aaf093185359e2fc0f04002e0cf8dfa34d71c2bc2120ef550833fe882783284e
    Port: <none>
    Host Port: <none>
    State: Running

4. At this point you are ready to start reviewing tcpdump output. The container will dump to STDOUT, so you simply tail the logs for that container:

$ kubectl logs -n <namespace> pod/dremio-executor-1 -c tcpdumper -f

ptions [nop,nop,TS val 3706115649 ecr 725671183], length 47
15:08:15.135188 IP dremio-executor-0.dremio-cluster-pod.test1.svc.cluster.local.54562 > dremio-executor-1.dremio-cluster-pod.test1.svc.cluster.local.45678: Flags [P.], seq 3203020:3203041, ack 3080716, win 52899, options [nop,nop,TS val 725671186 ecr 3706115648], length 21
15:08:15.135504 IP dremio-executor-0.dremio-cluster-pod.test1.svc.cluster.local.54562 > dremio-executor-1.dremio-cluster-pod.test1.svc.cluster.local.45678: Flags [P.], seq 3203041:3203062, ack 3080716, win 52899, options [nop,nop,TS val 725671186 ecr 3706115648], length 21
15:08:15.135967 IP dremio-executor-0.dremio-cluster-pod.test1.svc.cluster.local.54562 > dremio-executor-1.dremio-cluster-pod.test1.svc.cluster.local.45678: Flags [P.], seq 3203062:3203083, ack 3080716, win 52899, options [nop,nop,TS val 725671187 ecr 3706115648], length 21
15:08:15.136263 IP dremio-executor-1.dremio-cluster-pod.test1.svc.cluster.local.45678 > dremio-executor-0.dremio-cluster-pod.test1.svc.cluster.local.54562: Flags [.], ack 3203083, win 52896, options [nop,nop,TS val 3706115650 ecr 725671186], length 05

You can use all the usual command line tools to filter the output, or simply redirect/ tee to a file.

5. When you are done, remove the sidecar by rolling back the patch. Be aware that this will restart every container in the statefulset:

$ kubectl rollout undo statefulset/dremio-executor -n test1
statefulset.apps/dremio-executor rolled back

$ kubectl rollout status -n test1 statefulset/dremio-executor
Waiting for partitioned roll out to finish: 1 out of 2 new pods have been updated...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
partitioned roll out complete: 2 new pods have been updated...

Alternatively, it is possible to run an ephemeral container which does not require a restart of the pods, so is less intrusive.

To run an ephemeral tcpdump sidecar, simply run:

$ kubectl debug -n <namespace> -it pod/dremio-executor-0 --image=dockersec/tcpdump --target dremio-executor -- sh

....where dremio-executor is the name of the container you wish to inspect. This will run a shell on the container with tcpdump available, you then simply execute the tcpdump commands as normal. If you simply wish to run a specific command to STDOUT, you would run like so:

kubectl debug -n <namespace> -it pod/dremio-executor-0 --image=dockersec/tcpdump --target dremio-executor -- tcpdump -n -i any -s0 -v port 2181

Common Challenges

Note: You may find that the rollback results in pods stuck in terminating, this appears due to the tcpdump process failing to accept the SIGKILL (you will see the tcpdump sidecar continue to log as before). In this event, scale the statefulset to 0 then back up to cleanly finish the rollback per below examples.

$ kubectl scale statefulset/dremio-executor --replicas=0 -n <namespace>
statefulset.apps/dremio-executor scaled

$ kubectl scale statefulset/dremio-executor --replicas=2 -n <namespace>
statefulset.apps/dremio-executor scaled

Additional Resources

https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/

https://github.com/nicolaka/netshoot