"it's Python, you do anything and it allocates" How true is this? I modified CPython to print when it allocates an integer object Then added numbers in a for-loop 100k times My terminal got spammed with 101006 allocations Why? Let's explore the internals of CPython:
I was very surprised to learn that EVERY integer (not just really big ones) in Python is represented as a heap allocated `PyLongObject` Most interpreters use some variant of tagged pointers to avoid doing this, even Smalltalk in the 80s did this!
This is obviously bad for performance because allocations are slow, but also because the slow and unlikely path (using a very big integer requiring heap allocation) pessimizes the fast and likely path (using a regularly sized integer)
Anyway, there's a problem with the Python script I used: what if there are integer allocations in the internals of `print`? (although it can 100% avoid that) Let's remove the `print` call entirely:
That's a lot less allocations! Where did the other ~100k go? Let's look at the function that adds two integers, I've highlighted the codepath relevant to us:
Okay, so if the two integers are < 2^30 then it will unwrap their values and add them together. The `stwodigits` type is named that way because PyLongObject actually stores each int in base 2^30, essentially each digit is a 2^30 integer. Let's look at `_PyLong_FromSTwoDigits`:
I added comments for the three codepaths which depend on the value of `x`. Interestingly there is an optimization where the PyLongObjects for small ints are actually not heap allocated but instead are a static array and it just returns a pointer to that:
Okay let's look at the function used for medium ints, which are going to be the range of ints we use in our script anyway:
There's some interesting things here: 1. It does *not* use the `long_alloc` function from earlier 2. It attempts to reuse allocations from a freelist if possible 3. If it can't reuse memory it will allocate a new object
A "freelist" is a common technique used in memory allocation strategies to reuse previous allocations which have been freed previously. It lets you reuse memory instead of allocating new memory if possible. If we look at `long_dealloc` it tries to push the object to the freelist
I deleted the first printf I added to `long_alloc` in the first tweet, and added printfs to show when _PyLong_FromMedium allocates or reuses memory. Let's see what happens:
So it looks like we're reusing a lot of PyLongObjects! Obviously, there is still a lot of overhead to going through all this allocation when theoretically all it should take the CPU to add two integers is a single ADD instruction which can be executed in a cycle (or less!)
In addition I'm surprised there's kind of basic optimizations missing which have been around for decades.
But this is a fairly good example of how being able to control memory allocation lets you make specialized memory allocators which can avoid some of the overhead of allocation. Which is why Zig is one of my favorite languages
Anyway I go into more depth on this in my most recent blog post, taking a closer look at the allocator used in CPython to allocate objects and more: https://zackoverflow.dev/writi...
@zack_overflow Rust really does change how you think about every line of code
@zack_overflow Reminded me of this:
@zack_overflow Coming to realize I know nothing about cpython, ty for nerdsnipe
@zack_overflow Cpython, the brains behind python. Glad you did the tear down very well
@zack_overflow Nice one. I did something like this before (except I placed breakpoints on allocation before my F8 key wore out), hence why I could say with confidence
@zack_overflow @zeldapoem thank you for reconfirming my belief that everyone saying CPython is just as fast as C for computational physics is wrong
@zack_overflow let me guess it's <256 numbers being actual memory objects and large ones being actually created
@zack_overflow I wonder how it would look with PyPy
@zack_overflow Ruby encodes integer in its object id, then does oid >> 1 to get the value. I always thought it's awesome. Except for the rare case when you need full 64bit range, but find out that you have only 63bits and your code falls back to BigInt and is therefore slow. Happened once to me
@zack_overflow This is actually very cool. Bookmarking
@zack_overflow Me reading anything about python:
@zack_overflow neat, thank you!
@zack_overflow Curious, I don’t understand this too much but could you try with list comprehensions and the map function? Those two are common methods to speeding up these exact snippet of code
@zack_overflow Wow.. I am so aesthetic hungry.. first thought which into my mind after this post is which theme is this?? I am too dumb for this field











