aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Python/codecs.c
Commit message (Collapse)AuthorAge
* gh-133036: Deprecate codecs.open (#133038)Inada Naoki4 days
| | | | Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Victor Stinner <vstinner@python.org>
* gh-131238: Remove more includes from pycore_interp.h (#131480)Victor Stinner2025-03-19
|
* gh-131238: Add explicit includes to pycore headers (#131257)Victor Stinner2025-03-17
|
* gh-130790: Remove references about unicode's readiness from comments (#130801)Sergey Miryanov2025-03-03
|
* gh-129173: refactor `PyCodec_BackslashReplaceErrors` into separate functions ↵Bénédikt Tran2025-03-03
| | | | | | (#129895) The logic of `PyCodec_BackslashReplaceErrors` is now split into separate functions, each of which handling a specific exception type.
* gh-129173: simplify `PyCodec_XMLCharRefReplaceErrors` logic (#129894)Bénédikt Tran2025-03-03
| | | | | | | Writing the decimal representation of a Unicode codepoint only requires to know the number of digits. --------- Co-authored-by: Petr Viktorin <encukou@gmail.com>
* gh-129173: refactor `PyCodec_ReplaceErrors` into separate functions (#129893)Bénédikt Tran2025-02-25
| | | | The logic of `PyCodec_ReplaceErrors` is now split into separate functions, each of which handling a specific exception type.
* gh-129173: Use `_PyUnicodeError_GetParams` in ↵Bénédikt Tran2025-02-20
| | | | `PyCodec_SurrogateEscapeErrors` (GH-129175)
* gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_SurrogatePassErrors` ↵Bénédikt Tran2025-02-14
| | | | (GH-129134)
* gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_NameReplaceErrors` ↵Bénédikt Tran2025-02-08
| | | | (GH-129135)
* gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_IgnoreErrors` (#129174)Bénédikt Tran2025-01-24
| | | | We also cleanup `PyCodec_StrictErrors` and the error message rendered when an object of incorrect type is passed to codec error handlers.
* gh-126004: Fix positions handling in `codecs.backslashreplace_errors` (#127676)Bénédikt Tran2025-01-23
| | | | This fixes how `PyCodec_BackslashReplaceErrors` handles the `start` and `end` attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
* gh-126004: Fix positions handling in `codecs.replace_errors` (#127674)Bénédikt Tran2025-01-23
| | | | This fixes how `PyCodec_ReplaceErrors` handles the `start` and `end` attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
* gh-126004: Fix positions handling in `codecs.xmlcharrefreplace_errors` (#127675)Bénédikt Tran2025-01-23
| | | | This fixes how `PyCodec_XMLCharRefReplaceErrors` handles the `start` and `end` attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
* gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_STR) (#125194)Victor Stinner2024-10-09
| | | | | Replace PyUnicode_New(0, 0), PyUnicode_FromString("") and PyUnicode_FromStringAndSize("", 0) with Py_GetConstant(Py_CONSTANT_EMPTY_STR).
* gh-124665: Add `_PyCodec_UnregisterError` and `_codecs._unregister_error` ↵Bénédikt Tran2024-09-29
| | | | (#124677)
* gh-113993: Allow interned strings to be mortal, and fix related issues ↵Petr Viktorin2024-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (GH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
* gh-116738: Make `_codecs` module thread-safe (#117530)Brett Simmers2024-05-02
| | | | | | | | | | | | | | | The module itself is a thin wrapper around calls to functions in `Python/codecs.c`, so that's where the meaningful changes happened: - Move codecs-related state that lives on `PyInterpreterState` to a struct declared in `pycore_codecs.h`. - In free-threaded builds, add a mutex to `codecs_state` to synchronize operations on `search_path`. Because `search_path_mutex` is used as a normal mutex and not a critical section, we must be extremely careful with operations called while holding it. - The codec registry is explicitly initialized as part of `_PyUnicode_InitEncodings` to simplify thread-safety.
* gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249)Kirill Podoprigora2023-11-30
|
* gh-111789: Use PyDict_GetItemRef() in Python/codecs.c (gh-112082)Serhiy Storchaka2023-11-27
|
* gh-108765: Python.h no longer includes <ctype.h> (#108831)Victor Stinner2023-09-03
| | | | | | | | | | | | | | | | | | | | | | | Remove <ctype.h> in C files which don't use it; only sre.c and _decimal.c still use it. Remove _PY_PORT_CTYPE_UTF8_ISSUE code from pyport.h: * Code added by commit b5047fd01948ab108edcc1b3c2c901d915814cfd in 2004 for MacOSX and FreeBSD. * Test removed by commit 52ddaefb6bab1a74ecffe8519c02735794ebfbe1 in 2007, since Python str type now uses locale independent functions like Py_ISALPHA() and Py_TOLOWER() and the Unicode database. Modules/_sre/sre.c replaces _PY_PORT_CTYPE_UTF8_ISSUE with new functions: sre_isalnum(), sre_tolower(), sre_toupper(). Remove unused includes: * _localemodule.c: remove <stdio.h>. * getargs.c: remove <float.h>. * dynload_win.c: remove <direct.h>, it no longer calls _getcwd() since commit fb1f68ed7cc1536482d1debd70a53c5442135fe2 (in 2001).
* gh-108308: Replace _PyDict_GetItemStringWithError() (#108372)Victor Stinner2023-08-23
| | | | | | | Replace _PyDict_GetItemStringWithError() calls with PyDict_GetItemStringRef() which returns a strong reference to the item. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* gh-106320: Remove _PyDict_GetItemStringWithError() function (#108313)Victor Stinner2023-08-22
| | | | | | | | | Remove private _PyDict_GetItemStringWithError() function of the public C API: the new PyDict_GetItemStringRef() can be used instead. * Move private _PyDict_GetItemStringWithError() to the internal C API. * _testcapi get_code_extra_index() uses PyDict_GetItemStringRef(). Avoid using private functions in _testcapi which tests the public C API.
* gh-106521: Remove _PyObject_LookupAttr() function (GH-106642)Serhiy Storchaka2023-07-12
|
* gh-106320: Use _PyInterpreterState_GET() (#106336)Victor Stinner2023-07-02
| | | | Replace PyInterpreterState_Get() with inlined _PyInterpreterState_GET().
* gh-77757: replace exception wrapping by PEP-678 notes in typeobject's ↵Irit Katriel2023-04-11
| | | | __set_name__ (#103402)
* gh-102406: replace exception chaining by PEP-678 notes in codecs (#102407)Irit Katriel2023-03-21
|
* gh-99300: Replace Py_INCREF() with Py_NewRef() (#99530)Victor Stinner2022-11-16
| | | | Replace Py_INCREF() and Py_XINCREF() using a cast with Py_NewRef() and Py_XNewRef().
* gh-99300: Use Py_NewRef() in Python/ directory (#99302)Victor Stinner2022-11-10
| | | | Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and Py_XNewRef() in C files of the Python/ directory.
* bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized ↵Eric Snow2022-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | global objects. (gh-30928) We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules. The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings). https://bugs.python.org/issue46541#msg411799 explains the rationale for this change. The core of the change is in: * (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros * Include/internal/pycore_runtime_init.h - added the static initializers for the global strings * Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState * Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config. The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *. The following are not changed (yet): * stop using _Py_IDENTIFIER() in the stdlib modules * (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API * (maybe) intern the strings during runtime init https://bugs.python.org/issue46541
* bpo-45855: Replaced deprecated `PyImport_ImportModuleNoBlock` with ↵Kumar Aditya2021-12-12
| | | | PyImport_ImportModule (GH-30046)
* bpo-45439: Move _PyObject_CallNoArgs() to pycore_call.h (GH-28895)Victor Stinner2021-10-12
| | | | | | | * Move _PyObject_CallNoArgs() to pycore_call.h (internal C API). * _ssl, _sqlite and _testcapi extensions now call the public PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs(). * _lsprof extension is now built with Py_BUILD_CORE_MODULE macro defined to get access to internal _PyObject_CallNoArgs().
* bpo-45439: Rename _PyObject_CallNoArg() to _PyObject_CallNoArgs() (GH-28891)Victor Stinner2021-10-12
| | | | | Fix typo in the private _PyObject_CallNoArg() function name: rename it to _PyObject_CallNoArgs() to be consistent with the public function PyObject_CallNoArgs().
* bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)Victor Stinner2020-10-26
| | | | | | | | | | * UCD_Check() uses PyModule_Check() * Simplify the internal _PyUnicode_Name_CAPI structure: * Remove size and state members * Remove state and self parameters of getcode() and getname() functions * Remove global_module_state
* bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)Victor Stinner2020-10-26
| | | | | | | | | | The private _PyUnicode_Name_CAPI structure of the PyCapsule API unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the structure gets a new state member which must be passed to the getcode() and getname() functions. * Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h * unicodedata module is now built with Py_BUILD_CORE_MODULE. * unicodedata: move hashAPI variable into unicodedata_module_state.
* bpo-41919, test_codecs: Move codecs.register calls to setUp() (GH-22513)Hai Shi2020-10-16
| | | | * Move the codecs' (un)register operation to testcases. * Remove _codecs._forget_codec() and _PyCodec_Forget()
* bpo-41842: Add codecs.unregister() function (GH-22360)Hai Shi2020-09-28
| | | | Add codecs.unregister() and PyCodec_Unregister() functions to unregister a codec search function.
* bpo-40268: Remove a few pycore_pystate.h includes (GH-19510)Victor Stinner2020-04-14
|
* bpo-40268: Rename _PyInterpreterState_GET_UNSAFE() (GH-19509)Victor Stinner2020-04-14
| | | | | | | Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET() for consistency with _PyThreadState_GET() and to have a shorter name (help to fit into 80 columns). Add also "assert(tstate != NULL);" to the function.
* bpo-40268: Include explicitly pycore_interp.h (GH-19505)Victor Stinner2020-04-14
| | | | pycore_pystate.h no longer includes pycore_interp.h: it's now included explicitly in files accessing PyInterpreterState.
* bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode ↵Serhiy Storchaka2020-04-11
| | | | data. (GH-19345)
* bpo-39947: Use _PyInterpreterState_GET_UNSAFE() (GH-18978)Victor Stinner2020-03-13
| | | | | | | Replace _PyInterpreterState_Get() function call with _PyInterpreterState_GET_UNSAFE() macro which is more efficient but don't check if tstate or interp is NULL. _Py_GetConfigsAsDict() now uses _PyThreadState_GET().
* closes bpo-39630: Update pointers to string literals to be const char *. ↵Andy Lester2020-02-13
| | | | (GH-18510)
* bpo-39245: Switch to public API for Vectorcall (GH-18460)Petr Viktorin2020-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The bulk of this patch was generated automatically with: for name in \ PyObject_Vectorcall \ Py_TPFLAGS_HAVE_VECTORCALL \ PyObject_VectorcallMethod \ PyVectorcall_Function \ PyObject_CallOneArg \ PyObject_CallMethodNoArgs \ PyObject_CallMethodOneArg \ ; do echo $name git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g" done old=_PyObject_FastCallDict new=PyObject_VectorcallDict git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g" and then cleaned up: - Revert changes to in docs & news - Revert changes to backcompat defines in headers - Nudge misaligned comments
* bpo-39573: Use Py_TYPE() macro in Python and Include directories (GH-18391)Victor Stinner2020-02-07
| | | Replace direct access to PyObject.ob_type with Py_TYPE().
* bpo-38631: Avoid Py_FatalError() in _PyCodecRegistry_Init() (GH-18217)Victor Stinner2020-01-27
| | | | _PyCodecRegistry_Init() now reports exceptions to the caller, rather than calling Py_FatalError().
* bpo-37751: Fix codecs.lookup() normalization (GH-15092)Jordon Xu2019-08-21
| | | | | Fix codecs.lookup() to normalize the encoding name the same way than encodings.normalize_encoding(), except that codecs.lookup() also converts the name to lower case.
* bpo-29548: no longer use PyEval_Call* functions (GH-14683)Jeroen Demeyer2019-07-12
|
* bpo-37483: fix reference leak in _PyCodec_Lookup (GH-14600)Jeroen Demeyer2019-07-05
|
* bpo-37483: add _PyObject_CallOneArg() function (#14558)Jeroen Demeyer2019-07-04
|