Killercoda Kube Controller Manager Misconfigured
I am planning to take the CKA exam in the near future. I work with Kubernetes daily at my job, but am mostly self taught, so it is probably realistic to say that I have some knowledge gaps. This blog is part of my preparation, where I go through all the scenarios on Killercoda.
Kube Controller Manager Misconfigured
https://killercoda.com/killer-shell-cka/scenario/kube-controller-manager-misconfigured
It is crashing, fix it
Custom Kube Controller Image
A custom Kube Controller Manager container image was running in this cluster for testing. It has been reverted back to the default one, but it’s not coming back up. Fix it.
The controller manager is responsible for managing different controllers, which in turn are responsible for pushing the cluster into the desired state. So for example the Replication Controller is responsible for ensuring, that the desired number of pod replicas are running. (see here)
Let’s start by checking what pods are running in the kube-system namespace. Here we see, that the kube-controller-manager-controlplane pod is crashing:
$ k get pods -n kube-system
NAME READY STATUS RESTARTS AGE
cilium-envoy-6hzbm 1/1 Running 1 (49m ago) 17d
cilium-kn99v 1/1 Running 1 (49m ago) 17d
cilium-operator-5d8ddcb8d8-t446j 1/1 Running 2 (49m ago) 17d
coredns-5f68d5bd7f-b4n4t 1/1 Running 1 (49m ago) 17d
coredns-5f68d5bd7f-mbzlm 1/1 Running 1 (49m ago) 17d
etcd-controlplane 1/1 Running 1 (49m ago) 17d
kube-apiserver-controlplane 1/1 Running 1 (49m ago) 17d
kube-controller-manager-controlplane 0/1 CrashLoopBackOff 6 (4m2s ago) 9m30s
kube-scheduler-controlplane 1/1 Running 1 (49m ago) 17d
Describing the pod and looking at the events we don’t see anything specific. It tries to start the container and fails:
$ k describe pod -n kube-system kube-controller-manager-controlplane
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 4m51s (x7 over 10m) kubelet spec.containers{kube-controller-manager}: Container image "registry.k8s.io/kube-controller-manager:v1.35.1" already present on machine and can be accessed by the pod
Normal Created 4m51s (x7 over 10m) kubelet spec.containers{kube-controller-manager}: Container created
Normal Started 4m51s (x7 over 10m) kubelet spec.containers{kube-controller-manager}: Container started
Warning BackOff 49s (x25 over 10m) kubelet spec.containers{kube-controller-manager}: Back-off restarting failed container kube-controller-manager in pod kube-controller-manager-controlplane_kube-system(3a55b705a84a6fefa87bc94071f8cc92)
Looking at the pod logs, we get the error message that an unknown flag was used. On the kube-controller-manager manpage we see that this flag does not seem to exist (see here).
$ k logs -n kube-system kube-controller-manager-controlplane
...
Error: unknown flag: --project-sidecar-insertion
In the static manifest file /etc/kubernetes/manifests/kube-controller-manager.yaml we see the faulty flag:
...
containers:
- command:
- kube-controller-manager
- ...
- --project-sidecar-insertion
Since I am unsure if this is the correct course of action I made a backup before removing the flag:
$ cp kube-controller-manager.yaml ~/backup.yaml
After removing it and checking the pods in the kube-system namespace again, we can see that the pod is up now.
$ k get pods -n kube-system | grep controller
kube-controller-manager-controlplane 1/1 Running 0 19s
Looking at the pod logs everything seems clean now.
$ k logs -n kube-system kube-controller-manager-controlplane
I0418 13:56:44.001105 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.002269 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.004508 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.010623 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.011051 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.011185 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.023478 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.023724 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.040319 1 shared_informer.go:370] "Waiting for caches to sync"
I0418 13:56:44.141680 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.229202 1 shared_informer.go:377] "Caches are synced"
I0418 13:56:44.229228 1 garbagecollector.go:166] "Garbage collector: all resource monitors have synced"
I0418 13:56:44.229237 1 garbagecollector.go:169] "Proceeding to collect garbage"
Learnings
- The controller manager is responsible for managing controllers. If it is not running the cluster state will drift from the desired state.