aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/InternalDocs
diff options
context:
space:
mode:
Diffstat (limited to 'InternalDocs')
-rw-r--r--InternalDocs/README.md3
-rw-r--r--InternalDocs/asyncio.md126
-rw-r--r--InternalDocs/qsbr.md129
3 files changed, 253 insertions, 5 deletions
diff --git a/InternalDocs/README.md b/InternalDocs/README.md
index c20aa015c5b..6b1d9198264 100644
--- a/InternalDocs/README.md
+++ b/InternalDocs/README.md
@@ -42,8 +42,9 @@ Program Execution
- [Exception Handling](exception_handling.md)
+- [Quiescent-State Based Reclamation (QSBR)](qsbr.md)
Modules
---
-- [asyncio](asyncio.md) \ No newline at end of file
+- [asyncio](asyncio.md)
diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md
index b60fe70478a..22159852ca5 100644
--- a/InternalDocs/asyncio.md
+++ b/InternalDocs/asyncio.md
@@ -2,10 +2,12 @@ asyncio
=======
-This document describes the working and implementation details of C
-implementation of the
+This document describes the working and implementation details of the
[`asyncio`](https://docs.python.org/3/library/asyncio.html) module.
+**The following section describes the implementation details of the C implementation**.
+
+# Task management
## Pre-Python 3.14 implementation
@@ -158,7 +160,8 @@ flowchart TD
subgraph two["Thread deallocating"]
A1{"thread's task list empty? <br> llist_empty(tstate->asyncio_tasks_head)"}
A1 --> |true| B1["deallocate thread<br>free_threadstate(tstate)"]
- A1 --> |false| C1["add tasks to interpreter's task list<br> llist_concat(&tstate->interp->asyncio_tasks_head,tstate->asyncio_tasks_head)"]
+ A1 --> |false| C1["add tasks to interpreter's task list<br> llist_concat(&tstate->interp->asyncio_tasks_head,
+ &tstate->asyncio_tasks_head)"]
C1 --> B1
end
@@ -205,6 +208,121 @@ In free-threading, it avoids contention on a global dictionary as
threads can access the current task of thier running loop without any
locking.
+---
+
+**The following section describes the implementation details of the Python implementation**.
+
+# async generators
+
+This section describes the implementation details of async generators in `asyncio`.
+
+Since async generators are meant to be used from coroutines,
+their finalization (execution of finally blocks) needs
+to be done while the loop is running.
+Most async generators are closed automatically
+when they are fully iterated over and exhausted; however,
+if the async generator is not fully iterated over,
+it may not be closed properly, leading to the `finally` blocks not being executed.
+
+Consider the following code:
+```py
+import asyncio
+
+async def agen():
+ try:
+ yield 1
+ finally:
+ await asyncio.sleep(1)
+ print("finally executed")
+
+
+async def main():
+ async for i in agen():
+ break
+
+loop = asyncio.EventLoop()
+loop.run_until_complete(main())
+```
+
+The above code will not print "finally executed", because the
+async generator `agen` is not fully iterated over
+and it is not closed manually by awaiting `agen.aclose()`.
+
+To solve this, `asyncio` uses the `sys.set_asyncgen_hooks` function to
+set hooks for finalizing async generators as described in
+[PEP 525](https://peps.python.org/pep-0525/).
+
+- **firstiter hook**: When the async generator is iterated over for the first time,
+the *firstiter hook* is called. The async generator is added to `loop._asyncgens` WeakSet
+and the event loop tracks all active async generators.
+
+- **finalizer hook**: When the async generator is about to be finalized,
+the *finalizer hook* is called. The event loop removes the async generator
+from `loop._asyncgens` WeakSet, and schedules the finalization of the async
+generator by creating a task calling `agen.aclose()`. This ensures that the
+finally block is executed while the event loop is running. When the loop is
+shutting down, the loop checks if there are active async generators and if so,
+it similarly schedules the finalization of all active async generators by calling
+`agen.aclose()` on each of them and waits for them to complete before shutting
+down the loop.
+
+This ensures that the async generator's `finally` blocks are executed even
+if the generator is not explicitly closed.
+
+Consider the following example:
+
+```python
+import asyncio
+
+async def agen():
+ try:
+ yield 1
+ yield 2
+ finally:
+ print("executing finally block")
+
+async def main():
+ async for item in agen():
+ print(item)
+ break # not fully iterated
+
+asyncio.run(main())
+```
+
+```mermaid
+flowchart TD
+ subgraph one["Loop running"]
+ A["asyncio.run(main())"] --> B
+ B["set async generator hooks <br> sys.set_asyncgen_hooks()"] --> C
+ C["async for item in agen"] --> F
+ F{"first iteration?"} --> |true|D
+ F{"first iteration?"} --> |false|H
+ D["calls firstiter hook<br>loop._asyncgen_firstiter_hook(agen)"] --> E
+ E["add agen to WeakSet<br> loop._asyncgens.add(agen)"] --> H
+ H["item = await agen.\_\_anext\_\_()"] --> J
+ J{"StopAsyncIteration?"} --> |true|M
+ J{"StopAsyncIteration?"} --> |false|I
+ I["print(item)"] --> S
+ S{"continue iterating?"} --> |true|C
+ S{"continue iterating?"} --> |false|M
+ M{"agen is no longer referenced?"} --> |true|N
+ M{"agen is no longer referenced?"} --> |false|two
+ N["finalize agen<br>_PyGen_Finalize(agen)"] --> O
+ O["calls finalizer hook<br>loop._asyncgen_finalizer_hook(agen)"] --> P
+ P["remove agen from WeakSet<br>loop._asyncgens.discard(agen)"] --> Q
+ Q["schedule task to close it<br>self.create_task(agen.aclose())"] --> R
+ R["print('executing finally block')"] --> E1
+
+ end
+
+ subgraph two["Loop shutting down"]
+ A1{"check for alive async generators?"} --> |true|B1
+ B1["close all async generators <br> await asyncio.gather\(*\[ag.aclose\(\) for ag in loop._asyncgens\]"] --> R
+ A1{"check for alive async generators?"} --> |false|E1
+ E1["loop.close()"]
+ end
+
+```
[^1]: https://github.com/python/cpython/issues/123089
-[^2]: https://github.com/python/cpython/issues/80788 \ No newline at end of file
+[^2]: https://github.com/python/cpython/issues/80788
diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md
new file mode 100644
index 00000000000..1c4a79a7b44
--- /dev/null
+++ b/InternalDocs/qsbr.md
@@ -0,0 +1,129 @@
+# Quiescent-State Based Reclamation (QSBR)
+
+## Introduction
+
+When implementing lock-free data structures, a key challenge is determining
+when it is safe to free memory that has been logically removed from a
+structure. Freeing memory too early can lead to use-after-free bugs if another
+thread is still accessing it. Freeing it too late results in excessive memory
+consumption.
+
+Safe memory reclamation (SMR) schemes address this by delaying the free
+operation until all concurrent read accesses are guaranteed to have completed.
+Quiescent-State Based Reclamation (QSBR) is a SMR scheme used in Python's
+free-threaded build to manage the lifecycle of shared memory.
+
+QSBR requires threads to periodically report that they are in a quiescent
+state. A thread is in a quiescent state if it holds no references to shared
+objects that might be reclaimed. Think of it as a checkpoint where a thread
+signals, "I am not in the middle of any operation that relies on a shared
+resource." In Python, the eval_breaker provides a natural and convenient place
+for threads to report this state.
+
+
+## Use in Free-Threaded Python
+
+While CPython's memory management is dominated by reference counting and a
+tracing garbage collector, these mechanisms are not suitable for all data
+structures. For example, the backing array of a list object is not individually
+reference-counted but may have a shorter lifetime than the `PyListObject` that
+contains it. We could delay reclamation until the next GC run, but we want
+reclamation to be prompt and to run the GC less frequently in the free-threaded
+build, as it requires pausing all threads.
+
+Many operations in the free-threaded build are protected by locks. However, for
+performance-critical code, we want to allow reads to happen concurrently with
+updates. For instance, we want to avoid locking during most list read accesses.
+If a list is resized while another thread is reading it, QSBR provides the
+mechanism to determine when it is safe to free the list's old backing array.
+
+Specific use cases for QSBR include:
+
+* Dictionary keys (`PyDictKeysObject`) and list arrays (`_PyListArray`): When a
+dictionary or list that may be shared between threads is resized, we use QSBR
+to delay freeing the old keys or array until it's safe. For dicts and lists
+that are not shared, their storage can be freed immediately upon resize.
+
+* Mimalloc `mi_page_t`: Non-locking dictionary and list accesses require
+cooperation from the memory allocator. If an object is freed and its memory is
+reused, we must ensure the new object's reference count field is at the same
+memory location. In practice, this means when a mimalloc page (`mi_page_t`)
+becomes empty, we don't immediately allow it to be reused for allocations of a
+different size class. QSBR is used to determine when it's safe to repurpose the
+page or return its memory to the OS.
+
+
+## Implementation Details
+
+
+### Core Implementation
+
+The proposal to add QSBR to Python is contained in
+[Github issue 115103](https://github.com/python/cpython/issues/115103).
+Many details of that proposal have been copied here, so they can be kept
+up-to-date with the actual implementation.
+
+Python's QSBR implementation is based on FreeBSD's "Global Unbounded
+Sequences." [^1][^2][^3]. It relies on a few key counters:
+
+* Global Write Sequence (`wr_seq`): A per-interpreter counter, `wr_seq`, is started
+at 1 and incremented by 2 each time it is advanced. This ensures its value is
+always odd, which can be used to distinguish it from other state values. When
+an object needs to be reclaimed, `wr_seq` is advanced, and the object is tagged
+with this new sequence number.
+
+* Per-Thread Read Sequence: Each thread has a local read sequence counter. When
+a thread reaches a quiescent state (e.g., at the eval_breaker), it copies the
+current global `wr_seq` to its local counter.
+
+* Global Read Sequence (`rd_seq`): This per-interpreter value stores the minimum
+of all per-thread read sequence counters (excluding detached threads). It is
+updated by a "polling" operation.
+
+To free an object, the following steps are taken:
+
+1. Advance the global `wr_seq`.
+
+2. Add the object's pointer to a deferred-free list, tagging it with the new
+ `wr_seq` value as its qsbr_goal.
+
+Periodically, a polling mechanism processes this deferred-free list:
+
+1. The minimum read sequence value across all active threads is calculated and
+ stored as the global `rd_seq`.
+
+2. For each item on the deferred-free list, if its qsbr_goal is less than or
+ equal to the new `rd_seq`, its memory is freed, and it is removed from the:
+ list. Otherwise, it remains on the list for a future attempt.
+
+
+### Deferred Advance Optimization
+
+To reduce memory contention from frequent updates to the global `wr_seq`, its
+advancement is sometimes deferred. Instead of incrementing `wr_seq` on every
+reclamation request, each thread tracks its number of deferrals locally. Once
+the deferral count reaches a limit (QSBR_DEFERRED_LIMIT, currently 10), the
+thread advances the global `wr_seq` and resets its local count.
+
+When an object is added to the deferred-free list, its qsbr_goal is set to
+`wr_seq` + 2. By setting the goal to the next sequence value, we ensure it's safe
+to defer the global counter advancement. This optimization improves runtime
+speed but may increase peak memory usage by slightly delaying when memory can
+be reclaimed.
+
+
+## Limitations
+
+Determining the `rd_seq` requires scanning over all thread states. This operation
+could become a bottleneck in applications with a very large number of threads
+(e.g., >1,000). Future work may address this with more advanced mechanisms,
+such as a tree-based structure or incremental scanning. For now, the
+implementation prioritizes simplicity, with plans for refinement if
+multi-threaded benchmarks reveal performance issues.
+
+
+## References
+
+[^1]: https://youtu.be/ZXUIFj4nRjk?t=694
+[^2]: https://people.kernel.org/joelfernandes/gus-vs-rcu
+[^3]: http://bxr.su/FreeBSD/sys/kern/subr_smr.c#44