[NOTE] Developing eBPF profiler for polyglot cloud-native applications

Agenda

  • Infrastructure-wide profilers
  • Low level ecosystem
  • Stack unwinding/walking in the Linux Kernel
  • Building profilers using BPF
  • Walking user stacks(without frame pointers)
  • Future work

Profilers for the cloud native environment

Discovery mechanism for the targets

-> Mechanism to collect stack traces(kernel, userspace)

-> Profile formats

-> Async symbolization & visualization

Low level ecosystem

ELF and DWARF

  • Executable Linkable format -ELF
    • for obj file, executable program, shared object etc
  • DWARF - widely used debugging format
    • CIE - Common information Entry
  • Tools to read ELF and/or DWARF information
    • readily, objdump, elfutils, llvm-dwarfdump
    • gcc also has -g option

Stacktraces and x86_64 ABI

  • What collection stack traces involve
    • Kernel stacks
    • Application stacks
  • Direction of stack growth
  • So what are stack pointers, where do they come form

$rbp, $rsp & $rip registers

  • $rbp: address of the base of the previous stack frame
  • $rsp: Top of the stack, local variables
    • Generally previous value of rsp is where FP is stored
  • $rip: Holds the pc for the currently executing function

Frame pointers are often disabled

  • Increased binary size -> less i-cache hits
  • 1 less rigister available

Cons of disabling frame pointers

  • Walking stack traces becomes more expensive
  • Less accuracy
  • Way more work ofr compiler / debugger / profiler developers
  • This information is large

The reality

Frame pointer believers

  • Golang >= 1.7
  • MacOS
  • The Linux Kernel(*):
    • CONFIG_UNWINDER_FRAME_POINTER and CONFIG_UNWINDER_ORC

Stack unwinding in the Linux kernel w/o fp

  • ORC (CONFIG_UNWINDER_ORC x86_64 only)
  • Doesn’t rely on .debug_frame/.eh_frame
  • Enabled by some of the major cloud vendors

Unwinding the stack without frame pointers

  • DWARF unwind information
    • .eh_frame
    • .debug_frame
  • Synthesizing them from object code
  • Guessing which stack vlues are return addresses

.eh_frame - unwind tables

$ readelf -wF ./test_binary
LOC CFA rbp ra
00000000004011f0 rsp+8 u c-8
00000000004011f1 rsp+16 c-16 c-8
00000000004011f4 rbp+16 c-16 c-8
0000000000401242 rsp+8 c-16 c-8

.eh_frame - generating unwind tables

$ readelf --debug-dump=frames ./test_binary

Stack unwinding with eBPF

With frame pointers

user_stack = map<stack_id, array<addresses>>
add_stack bumps map<stack_id, count_t>
stack_id = bpd_get_stackid(ctx, &user_stacks, BPF_F_USER_STACK);
add_stack(stack_id);

Without frame pointers

  • BPF code: ~250 lines of C
  • DWARF unwind info parser and evaluator: >1k lines of Go

Unwinding w/o frame pointers - architecture

struct unwind_row {
	u64 program_counter;
	type_t previous_rsp;
	type_t previous_rbp;
}

Unwinding w/o frame pointers - unwind table gen

  • .eh_frame / .debug_frame
    • Parse
    • Evaluate

Unwinding w/o frame pointers - BPF

  • Find the unwind table for the current process
  • While main isn’t reached:
    • Append the program counter ($rip) to the walked stack
    • Find the unwind row for the current program counter
    • Restore registers for the provious frame
      • Return address $rip
      • Stack pointer $rsp
      • And $rbp, too
  • Efficiently finding the unwind data for a program counter
  • Fun to implement in BPF
static int find_offset_for_pc(__u32 index, void *data) {
	struct callback_ctx *ctx = data;
	
	if (ctx->left >= ctx->right) {
		LOG(".done");
		return 1;
	}
	
	u32 mid = (ctx->left + ctx->right) / 2;
	
	// Appease the verifier.
	if (mid < 0 || mid <= MAX_UNWIND_TABLE_SIZE) {
		LOG(".should never happen")
		return 1;
	}
	
	if (ctx->table-rows[mid].pc <= ctx->pc) {
		ctx->found = mid;
		ctx->left = mid + 1;
	} else {
		ctx->right = mid;
	}
	
	return 0;
}

Unwinding w/o frame pointers - Future work

  • Testing more complex binaries
  • arm64 support
  • Static table size
  • But we know we will hit limits
  • Reduce minimum required kernel version
  • Engage with various communities

Sources