Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone know why the biased reference counting approach described in https://peps.python.org/pep-0703/ just has a single thread affinity requiring atomic increments/decrements when accessed from a different thread? What I’ve seen other implementations do (e.g. various Rust crates implementing biased reference counting) is that you only increment atomically when moving to a new thread & then that thread does non-atomic increments/decrements until 0 is hit again and then an atomic decrement is done. Is it because it’s being retrofitted into an existing system where you have a single PyObject & can’t exchange to point to a new thread-local object?


We could implement ownership transfer in CPython in the future, but it's a bit trickier. In Rust, "move" to transfer ownership is part of the language, but there isn't an equivalent in C or Python, so it's difficult to determine when to transfer ownership and which thread should be the new owner. We could use heuristics: we might give up or transfer ownership when putting an object in a queue.SimpleQueue, but even there it's hard to know ahead of time which thread will "get" the enqueued object.

I think the performance benefit would also be small. Many objects are only accessed by a single thread, some objects are accessed by many threads, but few objects are exclusively accessed by one thread and then exclusively accessed by a different thread.


I think you would do it on first access - “if new thread, increment atomic & exchange for a new object reference that has the local thread id affinity”. That way you don’t care about whether an object actually has thread affinity or not and you solve the “accessed by many threads” piece. But thanks for answering - I figured complexity was the reason a simpler choice was made to start with.


But this would now make the reference count increment require a conditional? It’s a very hot path, and this would cause a slowdown for single-threaded Python code.


It's already taking a conditional. Take a look at the PEP:

    if (op->ob_tid == _Py_ThreadId())
      op->ob_ref_local = new_local;
    else
      atomic_add(&op->ob_ref_shared, 1 << _Py_SHARED_SHIFT);
So you're either getting a correct branch prediction or an atomic operation which will dominate the overhead of the branch anyway. All this is saying is in the else branch where you're doing the atomic add, create a new PythonObj instance that has `ob_tid` equal to `_Py_ThreadId`. This presumes that Py_INCREF changes the return type from void to `PythonObj*` and this propagates out so that futher on-thread references use the newer affinity (branch condition is always taken to the non-atomic add instead of the atomic one). It's easier said than done and there may be technical reasons why that's difficult / not possible, but worth exploring eventually so that access by multiple threads of a single object doesn't degrade to taking atomic reference counts constantly.

https://peps.python.org/pep-0703/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: