Does anyone know why the biased reference counting approach described in https:/...

colesbury · on March 11, 2024

We could implement ownership transfer in CPython in the future, but it's a bit trickier. In Rust, "move" to transfer ownership is part of the language, but there isn't an equivalent in C or Python, so it's difficult to determine when to transfer ownership and which thread should be the new owner. We could use heuristics: we might give up or transfer ownership when putting an object in a queue.SimpleQueue, but even there it's hard to know ahead of time which thread will "get" the enqueued object.

I think the performance benefit would also be small. Many objects are only accessed by a single thread, some objects are accessed by many threads, but few objects are exclusively accessed by one thread and then exclusively accessed by a different thread.

vlovich123 · on March 11, 2024

I think you would do it on first access - “if new thread, increment atomic & exchange for a new object reference that has the local thread id affinity”. That way you don’t care about whether an object actually has thread affinity or not and you solve the “accessed by many threads” piece. But thanks for answering - I figured complexity was the reason a simpler choice was made to start with.

orf · on March 11, 2024

But this would now make the reference count increment require a conditional? It’s a very hot path, and this would cause a slowdown for single-threaded Python code.

vlovich123 · on March 11, 2024

It's already taking a conditional. Take a look at the PEP:

    if (op->ob_tid == _Py_ThreadId())
      op->ob_ref_local = new_local;
    else
      atomic_add(&op->ob_ref_shared, 1 << _Py_SHARED_SHIFT);

So you're either getting a correct branch prediction or an atomic operation which will dominate the overhead of the branch anyway. All this is saying is in the else branch where you're doing the atomic add, create a new PythonObj instance that has `ob_tid` equal to `_Py_ThreadId`. This presumes that Py_INCREF changes the return type from void to `PythonObj*` and this propagates out so that futher on-thread references use the newer affinity (branch condition is always taken to the non-atomic add instead of the atomic one). It's easier said than done and there may be technical reasons why that's difficult / not possible, but worth exploring eventually so that access by multiple threads of a single object doesn't degrade to taking atomic reference counts constantly.

https://peps.python.org/pep-0703/