diff options
Diffstat (limited to 'Doc/library/html.parser.rst')
-rw-r--r-- | Doc/library/html.parser.rst | 51 |
1 files changed, 37 insertions, 14 deletions
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst index 6d433b5a04f..dd67fc34e85 100644 --- a/Doc/library/html.parser.rst +++ b/Doc/library/html.parser.rst @@ -43,7 +43,9 @@ Example HTML Parser Application As a basic example, below is a simple HTML parser that uses the :class:`HTMLParser` class to print out start tags, end tags, and data -as they are encountered:: +as they are encountered: + +.. testcode:: from html.parser import HTMLParser @@ -63,7 +65,7 @@ as they are encountered:: The output will then be: -.. code-block:: none +.. testoutput:: Encountered a start tag: html Encountered a start tag: head @@ -230,7 +232,9 @@ Examples -------- The following class implements a parser that will be used to illustrate more -examples:: +examples: + +.. testcode:: from html.parser import HTMLParser from html.entities import name2codepoint @@ -266,13 +270,17 @@ examples:: parser = MyHTMLParser() -Parsing a doctype:: +Parsing a doctype: + +.. doctest:: >>> parser.feed('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ' ... '"http://www.w3.org/TR/html4/strict.dtd">') Decl : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd" -Parsing an element with a few attributes and a title:: +Parsing an element with a few attributes and a title: + +.. doctest:: >>> parser.feed('<img src="python-logo.png" alt="The Python logo">') Start tag: img @@ -285,7 +293,9 @@ Parsing an element with a few attributes and a title:: End tag : h1 The content of ``script`` and ``style`` elements is returned as is, without -further parsing:: +further parsing: + +.. doctest:: >>> parser.feed('<style type="text/css">#python { color: green }</style>') Start tag: style @@ -300,16 +310,25 @@ further parsing:: Data : alert("<strong>hello!</strong>"); End tag : script -Parsing comments:: +Parsing comments: + +.. doctest:: - >>> parser.feed('<!-- a comment -->' + >>> parser.feed('<!--a comment-->' ... '<!--[if IE 9]>IE-specific content<![endif]-->') - Comment : a comment + Comment : a comment Comment : [if IE 9]>IE-specific content<![endif] Parsing named and numeric character references and converting them to the -correct char (note: these 3 references are all equivalent to ``'>'``):: +correct char (note: these 3 references are all equivalent to ``'>'``): +.. doctest:: + + >>> parser = MyHTMLParser() + >>> parser.feed('>>>') + Data : >>> + + >>> parser = MyHTMLParser(convert_charrefs=False) >>> parser.feed('>>>') Named ent: > Num ent : > @@ -317,18 +336,22 @@ correct char (note: these 3 references are all equivalent to ``'>'``):: Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but :meth:`~HTMLParser.handle_data` might be called more than once -(unless *convert_charrefs* is set to ``True``):: +(unless *convert_charrefs* is set to ``True``): - >>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']: +.. doctest:: + + >>> for chunk in ['<sp', 'an>buff', 'ered', ' text</s', 'pan>']: ... parser.feed(chunk) ... Start tag: span Data : buff Data : ered - Data : text + Data : text End tag : span -Parsing invalid HTML (e.g. unquoted attributes) also works:: +Parsing invalid HTML (e.g. unquoted attributes) also works: + +.. doctest:: >>> parser.feed('<p><a class=link href=#main>tag soup</p ></a>') Start tag: p |