aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Doc/library/codecs.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/codecs.rst')
-rw-r--r--Doc/library/codecs.rst98
1 files changed, 92 insertions, 6 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst
index 14f6547e4e0..c5dae7c8e8f 100644
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -53,6 +53,14 @@ any codec:
:exc:`UnicodeDecodeError`). Refer to :ref:`codec-base-classes` for more
information on codec error handling.
+.. function:: charmap_build(string)
+
+ Return a mapping suitable for encoding with a custom single-byte encoding.
+ Given a :class:`str` *string* of up to 256 characters representing a
+ decoding table, returns either a compact internal mapping object
+ ``EncodingMap`` or a :class:`dictionary <dict>` mapping character ordinals
+ to byte values. Raises a :exc:`TypeError` on invalid input.
+
The full details for each codec can also be looked up directly:
.. function:: lookup(encoding, /)
@@ -208,7 +216,7 @@ wider range of codecs when working with binary files:
.. versionchanged:: 3.11
The ``'U'`` mode has been removed.
- .. deprecated:: next
+ .. deprecated:: 3.14
:func:`codecs.open` has been superseded by :func:`open`.
@@ -235,8 +243,8 @@ wider range of codecs when working with binary files:
.. function:: iterencode(iterator, encoding, errors='strict', **kwargs)
Uses an incremental encoder to iteratively encode the input provided by
- *iterator*. This function is a :term:`generator`.
- The *errors* argument (as well as any
+ *iterator*. *iterator* must yield :class:`str` objects.
+ This function is a :term:`generator`. The *errors* argument (as well as any
other keyword argument) is passed through to the incremental encoder.
This function requires that the codec accept text :class:`str` objects
@@ -247,8 +255,8 @@ wider range of codecs when working with binary files:
.. function:: iterdecode(iterator, encoding, errors='strict', **kwargs)
Uses an incremental decoder to iteratively decode the input provided by
- *iterator*. This function is a :term:`generator`.
- The *errors* argument (as well as any
+ *iterator*. *iterator* must yield :class:`bytes` objects.
+ This function is a :term:`generator`. The *errors* argument (as well as any
other keyword argument) is passed through to the incremental decoder.
This function requires that the codec accept :class:`bytes` objects
@@ -257,6 +265,20 @@ wider range of codecs when working with binary files:
:func:`iterencode`.
+.. function:: readbuffer_encode(buffer, errors=None, /)
+
+ Return a :class:`tuple` containing the raw bytes of *buffer*, a
+ :ref:`buffer-compatible object <bufferobjects>` or :class:`str`
+ (encoded to UTF-8 before processing), and their length in bytes.
+
+ The *errors* argument is ignored.
+
+ .. code-block:: pycon
+
+ >>> codecs.readbuffer_encode(b"Zito")
+ (b'Zito', 4)
+
+
The module also provides the following constants which are useful for reading
and writing to platform dependent files:
@@ -1373,7 +1395,11 @@ encodings.
| | | It is used in the Python |
| | | pickle protocol. |
+--------------------+---------+---------------------------+
-| undefined | | Raise an exception for |
+| undefined | | This Codec should only |
+| | | be used for testing |
+| | | purposes. |
+| | | |
+| | | Raise an exception for |
| | | all conversions, even |
| | | empty strings. The error |
| | | handler is ignored. |
@@ -1476,6 +1502,66 @@ mapping. It is not supported by :meth:`str.encode` (which only produces
Restoration of the ``rot13`` alias.
+:mod:`encodings` --- Encodings package
+--------------------------------------
+
+.. module:: encodings
+ :synopsis: Encodings package
+
+This module implements the following functions:
+
+.. function:: normalize_encoding(encoding)
+
+ Normalize encoding name *encoding*.
+
+ Normalization works as follows: all non-alphanumeric characters except the
+ dot used for Python package names are collapsed and replaced with a single
+ underscore, leading and trailing underscores are removed.
+ For example, ``' -;#'`` becomes ``'_'``.
+
+ Note that *encoding* should be ASCII only.
+
+
+.. note::
+ The following functions should not be used directly, except for testing
+ purposes; :func:`codecs.lookup` should be used instead.
+
+
+.. function:: search_function(encoding)
+
+ Search for the codec module corresponding to the given encoding name
+ *encoding*.
+
+ This function first normalizes the *encoding* using
+ :func:`normalize_encoding`, then looks for a corresponding alias.
+ It attempts to import a codec module from the encodings package using either
+ the alias or the normalized name. If the module is found and defines a valid
+ ``getregentry()`` function that returns a :class:`codecs.CodecInfo` object,
+ the codec is cached and returned.
+
+ If the codec module defines a ``getaliases()`` function any returned aliases
+ are registered for future use.
+
+
+.. function:: win32_code_page_search_function(encoding)
+
+ Search for a Windows code page encoding *encoding* of the form ``cpXXXX``.
+
+ If the code page is valid and supported, return a :class:`codecs.CodecInfo`
+ object for it.
+
+ .. availability:: Windows.
+
+ .. versionadded:: 3.14
+
+
+This module implements the following exception:
+
+.. exception:: CodecRegistryError
+
+ Raised when a codec is invalid or incompatible.
+
+
:mod:`encodings.idna` --- Internationalized Domain Names in Applications
------------------------------------------------------------------------