| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
module (GH-133287)
* Track the current executor, not the previous one, on the thread-state.
* Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
|
| |
|
| |
|
| |
|
|
|
| |
Co-authored-by: Zanie Blue <contact@zanie.dev>
|
| |
|
|
|
|
|
|
|
|
| |
* Handle escapes in DECREF_INPUTS
* Mark a few more functions as escaping
* Replace DECREF_INPUTS with PyStackRef_CLOSE where possible
|
| |
|
| |
|
|
|
|
|
| |
Co-authored-by: Garrett Gu <garrettgu777@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
|
|
|
| |
Expand out SETLOCAL so that code generator can see the decref. Mark Py_CLEAR as escaping
|
|
|
|
| |
interpreter. (GH-129525)
|
| |
|
| |
|
|
|
| |
Add new frame owner type for interpreter entry frames
|
| |
|
|
|
|
| |
(GH-128121)
|
|
|
| |
Change `PyMutex_LockFast` to take `PyMutex` as argument.
|
|
|
|
|
|
|
|
|
| |
The specialization only depends on the type, so no special thread-safety
considerations there.
STORE_SUBSCR_LIST_INT needs to lock the list before modifying it.
`_PyDict_SetItem_Take2` already internally locks the dictionary using a
critical section.
|
|
|
|
|
| |
(GH-124846)
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
|
|
|
|
|
|
|
|
|
| |
`BINARY_OP` (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.
Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.
Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
|
|
|
|
| |
PyStackRefs. (GH-125439)
|
|
|
|
|
|
|
| |
* Spill the evaluation around escaping calls in the generated interpreter and JIT.
* The code generator tracks live, cached values so they can be saved to memory when needed.
* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
|
|
|
|
|
| |
* Convert CALL_ALLOC_AND_ENTER_INIT to micro-ops such that tier 2 supports it
* Allow inexact arguments for CALL_ALLOC_AND_ENTER_INIT.
|
|
|
|
|
| |
* Factor some instructions into micro-ops to isolate CHECK_EVAL_BREAKER for escape analysis
* Eliminate CHECK_EVAL_BREAKER macro
|
| |
|
|
|
|
|
|
|
| |
(#122190)
The adaptive counter doesn't do anything currently in the free-threaded
build and TSan reports a data race due to concurrent modifications to
the counter.
|
|
|
|
| |
* Add support for 'prev_instr' to code generator and refactor some INSTRUMENTED instructions
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR sets up tagged pointers for CPython.
The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.
Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.
This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.
The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.
Please read Include/internal/pycore_stackref.h for more information!
---------
Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
|
| |
|
|
|
|
|
|
| |
* Check tracing in RESUME_CHECK
* Only change to RESUME_CHECK if not tracing
|
|
|
| |
Small TSAN fixups for instrumentation
|
|
|
|
| |
(GH-118279)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``),
shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The
API used for adaptive specialization counters is changed but the behavior is
(supposed to be) identical.
The behavior of the Tier 2 counters is changed:
- There are no longer dynamic thresholds (we never varied these).
- All counters now use the same exponential backoff.
- The counter for ``JUMP_BACKWARD`` starts counting down from 16.
- The ``temperature`` in side exits starts counting down from 64.
|
| |
|
|
|
|
|
| |
Splits the "cold" path, deopts and exits, from the "hot" path, reducing the size of most jitted instructions, at the cost of slower exits.
|
|
|
|
|
|
|
| |
(GH-114986)" (GH-116178)
Revert "gh-107674: Improve performance of `sys.settrace` (GH-114986)"
This reverts commit 0a61e237009bf6b833e13ac635299ee063377699.
|
|
|
|
|
| |
builds (#116013)
For now, disable all specialization when the GIL might be disabled.
|
| |
|
| |
|
|
|
|
|
| |
This replaces the old `TIER_{ONE,TWO}_ONLY` macros. Note that `specialized` implies `tier1`.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.
The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
|
|
|
|
| |
(GH-114142)
|
|
|
|
|
|
| |
This adds a safe memory reclamation scheme based on FreeBSD's "GUS" and
quiescent state based reclamation (QSBR). The API provides a mechanism
for callers to detect when it is safe to free memory that may be
concurrently accessed by readers.
|
| |
|
| |
|
|
|
| |
This fixes a recently introduced bug where the deferred count is being unnecessarily decremented to counteract an increment elsewhere that is no longer happening. This caused the values to flip around to "very large" 64-bit numbers.
|
|
|
|
|
|
| |
* Include destination T1 opcode in Error debug message
* Include destination T1 opcode in DEOPT debug message
* Remove obsolete comment from remove_unneeded_uops
* Change lltrace_instruction() to print caller's opcode/oparg
|