Dumping Python heap in production

Live debugging Python process’s memory!

The problem

You just got called because of an outage of an AI/LLM service written in Python and you found that is is a memory leak in production. Before restoring the service to mitigate the impact, you may want to capture as much context as possible and luckily, there are so many libraries/tools for python memory profiling in the ecosystem, like scalene or pympler:

>>> from pympler import muppy, summary
>>> all_objects = muppy.get_objects()
>>> sum1 = summary.summarize(all_objects)
>>> summary.print_(sum1)
                       types |   # objects |   total size
============================ | =========== | ============
                        dict |         546 |    953.30 KB
                         str |        8270 |    616.46 KB
                        list |         127 |    529.44 KB
                       tuple |        5021 |    410.62 KB
                        code |        1378 |    161.48 KB
                        type |          70 |     61.80 KB
          wrapper_descriptor |         508 |     39.69 KB
  builtin_function_or_method |         515 |     36.21 KB
                         int |         900 |     21.09 KB
           method_descriptor |         269 |     18.91 KB
                     weakref |         177 |     15.21 KB
         <class 'abc.ABCMeta |          16 |     14.12 KB
                         set |          48 |     10.88 KB
         function (__init__) |          81 |      9.49 KB
           member_descriptor |         131 |      9.21 KB

However, most of them share on major shortage: we can’t use them to analyze the memory of an already-running Python process (actually you can add an HTTP endpoint dumping heap info/start tracking memory allocation if you have included these libraries as dependencies).

It would be much better if we could have something like py-spy or parca but for memory; they are more than capable for live debugging cpu issues in production.

Can eBPF help?

eBPF is a tool to run user programs in privileged mode on Linux, and many low-overhead, non-intrusive observability tools are built on it, including the aforementioned parca for CPU profiling.

There are following tools for live debugging Python in bcc toolkit but unfortunately they are not very uselful in memory debugging:

Code Injection

With gdb, we can attach to a running Python process and inject python code for execution. Pyrasite and debug-toolkit are two projects following this route. Here we would use the latter one to demonstrate how we can dump the heap of a running Python process.

Requirements

  • have gdb on your Linux machine
  • install Poetry
  • the injected Python code could not import any dependency not installed by the running Python environment

How - BareMetal

git clone https://github.com/robusta-dev/debug-toolkit.git
cd debug-toolkit
poetry shell
poetry install
python src/debug_toolkit/main.py --help

# to inject code by string; 12345 is the PID
python src/debug_toolkit/main.py inject-string 12345 "f = open('test', 'w'); f.write('hello world'); f.close()"

# to inject code by a python script file
python src/debug_toolkit/main.py inject-file 12345 /path/to/python/file.py
# you can find the created file `test` under `/proc/12345/cwd`
  • for example, you can get a glance at the object allocation via the following inject python file:
import gc
import sys

gc.collect()
all_objects = gc.get_objects()
class_info = {}
for obj in all_objects:
  obj_class = obj.__class__
  obj_size = sys.getsizeof(obj)
  if obj_class not in class_info:
    class_info[obj_class] = {'count': 1, 'total_size': obj_size}
  else:
    class_info[obj_class]['count'] += 1
    class_info[obj_class]['total_size'] += obj_size

  refs = gc.get_referrers(obj)
  for ref in refs:
    if gc.is_tracked(ref):
      continue
    obj_class = type(obj)
    obj_size = sys.getsizeof(obj)
    if obj_class not in class_info:
      class_info[obj_class] = {'count': 1, 'total_size': obj_size}
    else:
      class_info[obj_class]['count'] += 1
      class_info[obj_class]['total_size'] += obj_size

sorted_classes = sorted(class_info.items(), key=lambda x: x[1]['total_size'], reverse=True)

with open('heap_alloc.txt', 'w') as f:
  for i, (cls, info) in enumerate(sorted_classes):
    f.write(f'{i} {cls.__name__} {info["count"]} {info["total_size"]}\n')

How - Kubernetes

  • kubectl apply the following deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-tools
  labels:
    app: python-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: python-tools
  template:
    metadata:
      labels:
        app: python-tools
    spec:
      hostPID: true
      containers:
        - name: python-tools
          image: robustadev/debug-toolkit:v7.0.1
          imagePullPolicy: Always
          securityContext:
            privileged: true
            capabilities:
              add:
                - SYS_PTRACE
  • kubectl exec into the pod, find the pid of the target container process on the same k8s worker node and then execute
# to inject code by string; 12345 is the PID of the process inside the pod/container
# because of hostPID, you can `ps aux | grep python` to find
debug-toolkit inject-string 12345 "f = open('test', 'w'); f.write('hello world'); f.close()"
# you can find the created file `test` under `/proc/12345/cwd`

How - only with gdb!

  • actually we can achieve the same without using any dependency, but rather inconvenient
  • create a file named inject_code; here {python_code} is the injected python code snippet. Remeber to escape any relevant characters here python_code.replace("\\", "\\\\").replace('"', '\\"').replace("\n", "\\n")
set trace-commands on
set logging on
set scheduler-locking off
call ((int (*)())PyGILState_Ensure)()
call ((int (*)(const char *))PyRun_SimpleString)("{python_code}")
call ((void (*) (int) )PyGILState_Release)($1)

then

gdb -p <pid> --batch --command=inject_code

See also