aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Doc/library/html.parser.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/html.parser.rst')
-rw-r--r--Doc/library/html.parser.rst51
1 files changed, 37 insertions, 14 deletions
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst
index 6d433b5a04f..dd67fc34e85 100644
--- a/Doc/library/html.parser.rst
+++ b/Doc/library/html.parser.rst
@@ -43,7 +43,9 @@ Example HTML Parser Application
As a basic example, below is a simple HTML parser that uses the
:class:`HTMLParser` class to print out start tags, end tags, and data
-as they are encountered::
+as they are encountered:
+
+.. testcode::
from html.parser import HTMLParser
@@ -63,7 +65,7 @@ as they are encountered::
The output will then be:
-.. code-block:: none
+.. testoutput::
Encountered a start tag: html
Encountered a start tag: head
@@ -230,7 +232,9 @@ Examples
--------
The following class implements a parser that will be used to illustrate more
-examples::
+examples:
+
+.. testcode::
from html.parser import HTMLParser
from html.entities import name2codepoint
@@ -266,13 +270,17 @@ examples::
parser = MyHTMLParser()
-Parsing a doctype::
+Parsing a doctype:
+
+.. doctest::
>>> parser.feed('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" '
... '"http://www.w3.org/TR/html4/strict.dtd">')
Decl : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"
-Parsing an element with a few attributes and a title::
+Parsing an element with a few attributes and a title:
+
+.. doctest::
>>> parser.feed('<img src="python-logo.png" alt="The Python logo">')
Start tag: img
@@ -285,7 +293,9 @@ Parsing an element with a few attributes and a title::
End tag : h1
The content of ``script`` and ``style`` elements is returned as is, without
-further parsing::
+further parsing:
+
+.. doctest::
>>> parser.feed('<style type="text/css">#python { color: green }</style>')
Start tag: style
@@ -300,16 +310,25 @@ further parsing::
Data : alert("<strong>hello!</strong>");
End tag : script
-Parsing comments::
+Parsing comments:
+
+.. doctest::
- >>> parser.feed('<!-- a comment -->'
+ >>> parser.feed('<!--a comment-->'
... '<!--[if IE 9]>IE-specific content<![endif]-->')
- Comment : a comment
+ Comment : a comment
Comment : [if IE 9]>IE-specific content<![endif]
Parsing named and numeric character references and converting them to the
-correct char (note: these 3 references are all equivalent to ``'>'``)::
+correct char (note: these 3 references are all equivalent to ``'>'``):
+.. doctest::
+
+ >>> parser = MyHTMLParser()
+ >>> parser.feed('&gt;&#62;&#x3E;')
+ Data : >>>
+
+ >>> parser = MyHTMLParser(convert_charrefs=False)
>>> parser.feed('&gt;&#62;&#x3E;')
Named ent: >
Num ent : >
@@ -317,18 +336,22 @@ correct char (note: these 3 references are all equivalent to ``'>'``)::
Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
:meth:`~HTMLParser.handle_data` might be called more than once
-(unless *convert_charrefs* is set to ``True``)::
+(unless *convert_charrefs* is set to ``True``):
- >>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']:
+.. doctest::
+
+ >>> for chunk in ['<sp', 'an>buff', 'ered', ' text</s', 'pan>']:
... parser.feed(chunk)
...
Start tag: span
Data : buff
Data : ered
- Data : text
+ Data : text
End tag : span
-Parsing invalid HTML (e.g. unquoted attributes) also works::
+Parsing invalid HTML (e.g. unquoted attributes) also works:
+
+.. doctest::
>>> parser.feed('<p><a class=link href=#main>tag soup</p ></a>')
Start tag: p