Rendered at 18:20:56 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
compumike 16 hours ago [-]
Author here! If you're running a Kubernetes cluster, I recommend you check `kubectl version` and see if you're running "Server Version: v1.36.[0,1,2]". If so, you may want to use the one-liner at the end of the article to check your "process_resident_memory_bytes" on each node, and consider restarting kubelet as a temporary workaround to tame the memory leak until v1.36.3 is released.
__turbobrew__ 30 minutes ago [-]
A good reason to health check the kubelet process and restart it when the checks fail.
compumike 15 minutes ago [-]
What kind of health checks? In my case, the kubelet process was staying alive and responsive to queries, I believe due to:
# cat /proc/$(pgrep kubelet)/oom_score_adj
-999
(from OOMScoreAdjust=-999 in /etc/systemd/system/kubelet.service)
With this score, the Linux OOM killer wouldn't touch it, but any of my Pods were fair game.
rirze 1 hours ago [-]
Very cool. It's often daunting to contribute to such a well-established and recognizable project, but this is exactly how it should work.
CamouflagedKiwi 1 hours ago [-]
Nice find.
Can't help but feel this is one of the subtle traps hidden beneath the advice that contexts aren't supposed to be stored. I know it's not always that easy, of course.
compumike 40 minutes ago [-]
Thanks. I know there's a `go vet` tool that's run as part of Kubernetes CI, and one of its checks is:
lostcancel: check cancel func returned by context.WithCancel is called
I'm not 100% sure why `go vet` didn't catch this issue, but storing the cancelFn in the struct is probably part of the reason. Any Go experts know if that's the case?
Can't help but feel this is one of the subtle traps hidden beneath the advice that contexts aren't supposed to be stored. I know it's not always that easy, of course.