| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
The reference count fields, such as `ob_tid` and `ob_ref_shared`, may be
accessed concurrently in the free threading build by a `_Py_TryXGetRef`
or similar operation. The PyObject header fields will be initialized by
`_PyObject_Init`, so only call `memset()` to zero-initialize the remainder
of the allocation.
|
|
|
|
|
|
|
|
|
| |
The `gc_get_refs` assertion needs to be after we check the alive and
unreachable bits. Otherwise, `ob_tid` may store the actual thread id
instead of the computed `gc_refs`, which may trigger the assertion if
the `ob_tid` looks like a negative value.
Also fix a few type warnings on 32-bit systems.
|
|
|
|
| |
(gh-129563)
|
|
|
| |
For the free-threaded version of the cyclic GC, restructure the "mark alive" phase to use software prefetch instructions. This gives a speedup in most cases when the number of objects is large enough. The prefetching is enabled conditionally based on the number of long-lived objects the GC finds.
|
|
|
| |
Replace PyErr_WriteUnraisable() with PyErr_FormatUnraisable().
|
|
|
| |
Replace "on verb+ing" with "while verb+ing".
|
|
|
|
|
|
| |
The stack pointers in interpreter frames are nearly always valid now, so
use them when visiting each thread's frame. For now, don't collect
objects with deferred references in the rare case that we see a frame
with a NULL stack pointer.
|
| |
|
|
|
| |
This is a precursor to the actual fix for gh-114940, where we will change these macros to use the new lock. This change is almost entirely mechanical; the exceptions are the loops in codeobject.c and ceval.c, which now hold the "head" lock. Note that almost all of the uses of _Py_FOR_EACH_TSTATE_UNLOCKED() here will change to _Py_FOR_EACH_TSTATE_BEGIN() once we add the new per-interpreter lock.
|
|
|
|
|
| |
replacing it (#122489)
Delay free a dictionary when replacing it
|
| |
|
|
|
|
| |
collection (GH-126502)" (#126983)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
collection (GH-126502)
* Mark almost all reachable objects before doing collection phase
* Add stats for objects marked
* Visit new frames before each increment
* Remove lazy dict tracking
* Update docs
* Clearer calculation of work to do.
|
|
|
|
|
| |
Also, _PyGC_Freeze() no longer freezes unreachable objects.
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
|
|
|
|
|
|
|
|
|
| |
`BINARY_OP` (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.
Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.
Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
|
|
|
|
|
|
|
|
|
| |
This fixes a crash when `gc.get_objects()` or `gc.get_referrers()` is
called during a GC in the free threading build.
Switch to `_PyObjectStack` to avoid corrupting the `struct worklist`
linked list maintained by the GC. Also, don't return objects that are frozen
(`gc.freeze()`) or in the process of being collected to more closely match
the behavior of the default build.
|
|
|
|
|
|
|
| |
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
|
|
|
|
|
|
|
|
|
|
|
| |
(#124459)
This fixes a crash when running the PyO3 test suite on the free-threaded
build. The `qsbr` field is initialized after the `PyThreadState` is
added to the interpreter's linked list -- it might still be NULL.
Instead, we "steal" the queue of to-be-freed memory blocks. This is
always initialized (possibly empty) and protected by the stop the world
pause.
|
|
|
|
|
|
|
| |
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.
Rename some of the files and functions in preparation for this change.
|
|
|
|
|
|
| |
Use a `_PyStackRef` and defer the reference to `f_funcobj` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
|
|
|
|
|
|
|
| |
(#124069)
If the generator is already cleared, then most fields in the
generator's frame are not valid other than f_executable. The invalid
fields may contain dangling pointers and should not be used.
|
|
|
|
|
|
|
|
| |
(#123924)
Use a `_PyStackRef` and defer the reference to `f_executable` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The free-threaded GC now visits interpreter stacks to keep objects
that use deferred reference counting alive.
Interpreter frames are zero initialized in the free-threaded GC so
that the GC doesn't see garbage data. This is a temporary measure
until stack spilling around escaping calls is implemented.
Co-authored-by: Ken Jin <kenjin@python.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were not properly accounting for interpreter memory leaks at
shutdown and had two sources of leaks:
* Objects that use deferred reference counting and were reachable via
static types outlive the final GC. We now disable deferred reference
counting on all objects if we are calling the GC due to interpreter
shutdown.
* `_PyMem_FreeDelayed` did not properly check for interpreter shutdown
so we had some memory blocks that were enqueued to be freed, but
never actually freed.
* `_PyType_FinalizeIdPool` wasn't called at interpreter shutdown.
|
|
|
|
|
|
|
| |
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.
Co-authored-by: Ken Jin <kenjin@python.org>
|
|
|
|
|
|
|
|
| |
This combines and updates our freelist handling to use a consistent
implementation. Objects in the freelist are linked together using the
first word of memory block.
If configured with freelists disabled, these operations are essentially
no-ops.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should maintain the invariant that a zero `ob_tid` implies the
refcount fields are merged.
* Move the assignment in `_Py_MergeZeroLocalRefcount` to immediately
before the refcount merge.
* Update `_PyTrash_thread_destroy_chain` to set `ob_ref_shared` to
`_Py_REF_MERGED` when setting `ob_tid` to zero.
Also check this invariant with assertions in the GC in debug builds.
That uncovered a bug when running out of memory during GC.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.
Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).
The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
|
|
|
|
|
|
|
|
|
| |
The free-threaded build currently immortalizes objects that use deferred
reference counting (see gh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.
This fixes a race condition involving the tracking of whether the
behavior is suppressed.
|
|
|
|
|
| |
Only call `gc_restore_tid()` from stop-the-world contexts.
`worklist_pop()` can be called while other threads are running, so use a
relaxed atomic to modify `ob_tid`.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the new public Raw functions:
* _PyTime_PerfCounterUnchecked() with PyTime_PerfCounterRaw()
* _PyTime_TimeUnchecked() with PyTime_TimeRaw()
* _PyTime_MonotonicUnchecked() with PyTime_MonotonicRaw()
Remove internal functions:
* _PyTime_PerfCounterUnchecked()
* _PyTime_TimeUnchecked()
* _PyTime_MonotonicUnchecked()
|
| |
|
|
|
|
|
|
|
|
|
| |
Deferred reference counting is not fully implemented yet. As a temporary
measure, we immortalize objects that would use deferred reference
counting to avoid multi-threaded scaling bottlenecks.
This is only performed in the free-threaded build once the first
non-main thread is started. Additionally, some tests, including refleak
tests, suppress this behavior.
|
|
|
|
|
| |
This marks objects as using deferred refrence counting using the
`ob_gc_bits` field in the free-threaded build and collects those objects
during GC.
|
|
|
|
|
| |
This keeps track of the per-thread total reference count operations in
PyThreadState in the free-threaded builds. The count is merged into the
interpreter's total when the thread exits.
|
|
|
|
| |
objects. (GH-116115)
|
|
|
|
|
|
|
|
| |
The free-threaded GC sometimes sees objects with zero refcount. This can
happen due to the delay in merging biased reference counting fields,
and, in the future, due to deferred reference counting. We should not
untrack these objects or they will never be collected.
This fixes the refleaks in the free-threaded build.
|
|
|
|
| |
size. (GH-117120)
|
| |
|
|
|
|
|
|
|
|
| |
(#116663)
This isn't strictly necessary because the implementation of `gc_should_collect`
already checks `gcstate->enabled` in the free-threaded build, but it seems
like a good idea until the common pieces of gc.c and gc_free_threading.c are
refactored out.
|
|
|
| |
Free objects with qsbr if shared
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Replace private _PyTime functions with public PyTime functions.
random_seed_time_pid() now reports errors to its caller.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename functions:
* _PyTime_GetSystemClock() => _PyTime_TimeUnchecked()
* _PyTime_GetPerfCounter() => _PyTime_PerfCounterUnchecked()
* _PyTime_GetMonotonicClock() => _PyTime_MonotonicUnchecked()
* _PyTime_GetSystemClockWithInfo() => _PyTime_TimeWithInfo()
* _PyTime_GetMonotonicClockWithInfo() => _PyTime_MonotonicWithInfo()
* _PyTime_GetMonotonicClockWithInfo() => _PyTime_MonotonicWithInfo()
Changes:
* Remove "typedef PyTime_t PyTime_t;" which was
"typedef PyTime_t _PyTime_t;" before a previous rename.
* Update comments of "Unchecked" functions.
* Remove invalid PyTime_Time() comment.
|
|
|
|
|
| |
<pycore_time.h> include is no longer needed to get the PyTime_t type
in internal header files. This type is now provided by <Python.h>
include. Add <pycore_time.h> includes to C files instead.
|
|
|
|
|
| |
Run command:
sed -i -e 's!\<_PyTime_t\>!PyTime_t!g' $(find -name "*.c" -o -name "*.h")
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.
The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
|