Doc/library/token.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266

:mod:`!token` --- Constants used with Python parse trees
========================================================

.. module:: token
   :synopsis: Constants representing terminal nodes of the parse tree.

.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>

**Source code:** :source:`Lib/token.py`

--------------

This module provides constants which represent the numeric values of leaf nodes
of the parse tree (terminal tokens).  Refer to the file :file:`Grammar/Tokens`
in the Python distribution for the definitions of the names in the context of
the language grammar.  The specific numeric values which the names map to may
change between Python versions.

The module also provides a mapping from numeric codes to names and some
functions.  The functions mirror definitions in the Python C header files.

Note that a token's value may depend on tokenizer options. For example, a
``"+"`` token may be reported as either :data:`PLUS` or :data:`OP`, or
a ``"match"`` token may be either :data:`NAME` or :data:`SOFT_KEYWORD`.


.. data:: tok_name

   Dictionary mapping the numeric values of the constants defined in this module
   back to name strings, allowing more human-readable representation of parse trees
   to be generated.


.. function:: ISTERMINAL(x)

   Return ``True`` for terminal token values.


.. function:: ISNONTERMINAL(x)

   Return ``True`` for non-terminal token values.


.. function:: ISEOF(x)

   Return ``True`` if *x* is the marker indicating the end of input.


The token constants are:

.. data:: NAME

   Token value that indicates an :ref:`identifier <identifiers>`.
   Note that keywords are also initially tokenized an ``NAME`` tokens.

.. data:: NUMBER

   Token value that indicates a :ref:`numeric literal <numbers>`

.. data:: STRING

   Token value that indicates a :ref:`string or byte literal <strings>`,
   excluding :ref:`formatted string literals <f-strings>`.
   The token string is not interpreted:
   it includes the surrounding quotation marks and the prefix (if given);
   backslashes are included literally, without processing escape sequences.

.. data:: OP

   A generic token value that indicates an
   :ref:`operator <operators>` or :ref:`delimiter <delimiters>`.

   .. impl-detail::

      This value is only reported by the :mod:`tokenize` module.
      Internally, the tokenizer uses
      :ref:`exact token types <token_operators_delimiters>` instead.

.. data:: COMMENT

   Token value used to indicate a comment.
   The parser ignores :data:`!COMMENT` tokens.

.. data:: NEWLINE

   Token value that indicates the end of a :ref:`logical line <logical-lines>`.

.. data:: NL

   Token value used to indicate a non-terminating newline.
   :data:`!NL` tokens are generated when a logical line of code is continued
   over multiple physical lines. The parser ignores :data:`!NL` tokens.

.. data:: INDENT

   Token value used at the beginning of a :ref:`logical line <logical-lines>`
   to indicate the start of an :ref:`indented block <indentation>`.

.. data:: DEDENT

   Token value used at the beginning of a :ref:`logical line <logical-lines>`
   to indicate the end of an :ref:`indented block <indentation>`.

.. data:: FSTRING_START

   Token value used to indicate the beginning of an
   :ref:`f-string literal <f-strings>`.

   .. impl-detail::

      The token string includes the prefix and the opening quote(s), but none
      of the contents of the literal.

.. data:: FSTRING_MIDDLE

   Token value used for literal text inside an :ref:`f-string literal <f-strings>`,
   including format specifications.

   .. impl-detail::

      Replacement fields (that is, the non-literal parts of f-strings) use
      the same tokens as other expressions, and are delimited by
      :data:`LBRACE`, :data:`RBRACE`, :data:`EXCLAMATION` and :data:`COLON`
      tokens.

.. data:: FSTRING_END

   Token value used to indicate the end of a :ref:`f-string <f-strings>`.

   .. impl-detail::

      The token string contains the closing quote(s).

.. data:: TSTRING_START

   Token value used to indicate the beginning of a template string literal.

   .. impl-detail::

      The token string includes the prefix and the opening quote(s), but none
      of the contents of the literal.

   .. versionadded:: next

.. data:: TSTRING_MIDDLE

   Token value used for literal text inside a template string literal
   including format specifications.

   .. impl-detail::

      Replacement fields (that is, the non-literal parts of t-strings) use
      the same tokens as other expressions, and are delimited by
      :data:`LBRACE`, :data:`RBRACE`, :data:`EXCLAMATION` and :data:`COLON`
      tokens.

   .. versionadded:: next

.. data:: TSTRING_END

   Token value used to indicate the end of a template string literal.

   .. impl-detail::

      The token string contains the closing quote(s).

   .. versionadded:: next

.. data:: ENDMARKER

   Token value that indicates the end of input.
   Used in :ref:`top-level grammar rules <top-level>`.

.. data:: ENCODING

   Token value that indicates the encoding used to decode the source bytes
   into text. The first token returned by :func:`tokenize.tokenize` will
   always be an ``ENCODING`` token.

   .. impl-detail::

      This token type isn't used by the C tokenizer but is needed for
      the :mod:`tokenize` module.


The following token types are not produced by the :mod:`tokenize` module,
and are defined for special uses in the tokenizer or parser:

.. data:: TYPE_IGNORE

   Token value indicating that a ``type: ignore`` comment was recognized.
   Such tokens are produced instead of regular :data:`COMMENT` tokens only
   with the :data:`~ast.PyCF_TYPE_COMMENTS` flag.

.. data:: TYPE_COMMENT

   Token value indicating that a type comment was recognized.
   Such tokens are produced instead of regular :data:`COMMENT` tokens only
   with the :data:`~ast.PyCF_TYPE_COMMENTS` flag.

.. data:: SOFT_KEYWORD

   Token value indicating a :ref:`soft keyword <soft-keywords>`.

   The tokenizer never produces this value.
   To check for a soft keyword, pass a :data:`NAME` token's string to
   :func:`keyword.issoftkeyword`.

.. data:: ERRORTOKEN

   Token value used to indicate wrong input.

   The :mod:`tokenize` module generally indicates errors by
   raising exceptions instead of emitting this token.
   It can also emit tokens such as :data:`OP` or :data:`NAME` with strings that
   are later rejected by the parser.


.. _token_operators_delimiters:

The remaining tokens represent specific :ref:`operators <operators>` and
:ref:`delimiters <delimiters>`.
(The :mod:`tokenize` module reports these as :data:`OP`; see ``exact_type``
in the :mod:`tokenize` documentation for details.)

.. include:: token-list.inc


The following non-token constants are provided:

.. data:: N_TOKENS

   The number of token types defined in this module.

.. NT_OFFSET is deliberately undocumented; if you need it you should be
   reading the source

.. data:: EXACT_TOKEN_TYPES

   A dictionary mapping the string representation of a token to its numeric code.

   .. versionadded:: 3.8


.. versionchanged:: 3.5
   Added :data:`!AWAIT` and :data:`!ASYNC` tokens.

.. versionchanged:: 3.7
   Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING` tokens.

.. versionchanged:: 3.7
   Removed :data:`!AWAIT` and :data:`!ASYNC` tokens. "async" and "await" are
   now tokenized as :data:`NAME` tokens.

.. versionchanged:: 3.8
   Added :data:`TYPE_COMMENT`, :data:`TYPE_IGNORE`, :data:`COLONEQUAL`.
   Added :data:`!AWAIT` and :data:`!ASYNC` tokens back (they're needed
   to support parsing older Python versions for :func:`ast.parse` with
   ``feature_version`` set to 6 or lower).

.. versionchanged:: 3.12
   Added :data:`EXCLAMATION`.

.. versionchanged:: 3.13
   Removed :data:`!AWAIT` and :data:`!ASYNC` tokens again.