Linux BPF CPU Profiling with kubectl on Kubernetes

6 min readDec 16, 2019

The Challenge

So a few weeks back I released a video about using Linux BPF to build up a container which is capable of profiling CPU of other containers on the same host. This would be particularly useful for having a portable container in a cloud managed container environment where one may not have direct access to install stuff on the host. For example, AKS, GKE or EKS. Although some cloud providers give you SSH access to the host, I’d like to pretend I can’t. So how do you profile processes without SSH?

So I wanted to find a way I could use kubectl to do the work. I want CPU profiling to be simple, fast and effective. When a workload on Kubernetes is consuming high CPU, I may be too busy to configure SSH, install dependencies, configure BPF tooling and start profiling. These things always happen at unexpected times, so we need to have a simple fast method of getting a CPU flame graph of our containers on Kubernetes.

Ideal Scenario

So in an ideal world, I’d like it to go like this:

08:00 AM

08:00 AM: Prometheus alert, High CPU 90% on Pod A, Cluster A, Node A

08:05 AM

Developer: "Hey, my pod is consuming high CPU, let me see whats going on"

08:06 AM (Developer profiles the bad pod for 60 seconds)

$>./flameget.sh --node "Node A" --pod "Pod A" -t "60"

And BOOM! We have an immediate result!

Now the above flame graph is actually from an idle .NET workload, but you get my drift :) We want simplicity, and a simple user experience

Linux BPF in Containers

I’ve recently built a BPF dockerfile to use as a portable profiling container. It consists of a dockerfile which acts as the base image that has all the BPF tools installed. In order to successfully profile in the cloud, the profiler needs the correct linux headers packages installed. So for example on on your local Ubuntu machine, you may have kernel 4.15 which can be confirmed by running uname -r To get the linux-headers, you generally run:

sudo apt-get install -y linux-headers-$(uname -r)

It’s important to note that not all cloud VM’ linux-headers are available using apt-get They may sit behind special repositories, or just somewhere on the internet.

You may have to track the packages down by searching online and use curl and dpkg -i to install them. Here is an example of my dockerfile that matches Azure 4.15.0.1061:

#Base image with all BPF tools
FROM aimvector/ebpf-tools:base #linux-headers-azure for BPF in the cloud :) RUN curl -LO http://security.ubuntu.com/ubuntu/pool/main/l/linux-azure/linux-azure-headers-4.15.0-1061_4.15.0-1061.66_all.debRUN curl -LO http://archive.ubuntu.com/ubuntu/pool/main/l/linux-azure/linux-headers-4.15.0-1061-azure_4.15.0-1061.66_amd64.debRUN dpkg -i linux-azure-headers-4.15.0-1061_4.15.0-1061.66_all.deb
RUN dpkg -i linux-headers-4.15.0-1061-azure_4.15.0-1061.66_amd64.deb ENTRYPOINT [ "/bin/bash"] #docker build . -f ./azure.4.15.0.1061.dockerfile -t aimvector/ebpf-tools:azure.4.15.0.1061

You can also apply the above logic to other cloud provider Kubernetes offerings to build a profiler image that machines their respective kernel versions

Running `uname -r` gives you the kernel version and you’ll have to hunt down the linux-headers that match it. I wrote more about my learnings of containerizing BPF here

Now given I have my profile container with all the BPF tools in an image, I could use the Kubernetes functionality to schedule the profile container on the host I am targeting.

My High Tech Low Budget Cheap Solution

So let’s take a look at flameget.sh

Since we know the troublesome node and pod, we can use this info to automate the CPU profiling process.The script takes a bunch of inputs:

Note: All fields are required as I want to try avoid user passing wrong arguments.

— node is the name of the node where the target troublesome pod is running.

— pod is the name of the troublesome pod you wish to profile.

— container is the container name inside the pod. Remember a pod can consist of multiple containers and have different process IDs. The basic first objective of the script is to find the pods container process ID (PID) on the host, not the process id inside the container. The script will use docker ps to grab the container ID and perform a docker inspect to grab the container’s host process ID. We’ll take a look at the profiler in a sec.

-t is the time (in seconds) we’d like to profile for. Note that it needs to be wrapped in quotes because I’ve not gotten around to making it work as an integer just yet.

-i in the container image we built at the beginning of this post which we’ll wish to use as a profiler.

Full example: --help also works

$>./flameget.sh \
--node "Node A" \
--pod "Pod A" \ 
--container "Container A" \
-t "60" \
-i "aimvector/ebpf-tools:azure.4.15.0.1061"

SSH + AUTOMATION = KUBECTL

flameget.sh uses pure kubectl to deploy a our profiler to Kubernetes. kubectl takes care of all the deployment and scheduling and interaction with our profiler

Once the profiler is deployed, we watch kubectl for a “Running” state. This is a simple bash loop.

We watch our profiler to make sure its running

Once the profiler pod is in a running state, we break loop and we loop again and watch its pod logs for a “completed”message.

We watch our profiler logs for completed message

The profiler will produce a scalable vector graphic (svg) and once our loop is broken we copy it out with thekubectl cp copy command

Once copied out, we delete the profile pod. So kubectl’s role here is to automate running our profiler and provide a simple interface to get the work done. It pretty much replaces the need for SSH. Let’s take a look at the profiler

The Profiler

The flameget.sh script comes with a ebpf-profiler.yaml file which is the manifest for our heavy lifting. You can take a look at the template here .

The script injects all our arguments such as node, pod, time and profiler image into the YAML using sed to replace placeholders and simply pipes it to kubectl apply

If you take a look at the YAML file you’ll notice replacement placeholders in the form of {{}} We’re basically doing a simple Kubernetes deployment here.

To target the node, we pass in a node selector. To target a pod, we we run some basic shell script in the profiler’s container spec.command,which does a docker ps to grab the container ID of the pod we are targeting. It also uses docker inspect to use that container id to grab the process ID.

Once we have a process ID, we pass that to the BCC tools profile which does the BPF magic. The results are passed to Flamegraph. This is the basic ./bcc/tools/profile.py | ./FlameGraph/flamegraph.pl > image.svg line.

The last sleep 30s is a hack because our flameget.sh does loop checking, we have to give ourselves enough time to copy the flame graph image out of the pod before we delete it.

And that’s it! As promised, here is a video of me going through all of the above and you can see all the stuff in action :) You can find all the Kubernetes related source code here

Remember to hit that like and subscribe button ^-^

Peace!

Marcel