| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
(GH-22658)
When calling .close() the HTMLParser should flush all remaining content,
even when that content is in an unclosed script or style tag.
|
|
|
| |
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
in attribute values (GH-95215)
According to the HTML5 spec, named character references in attribute values
should only be processed if they are not followed by an ASCII alphanumeric,
or an equals sign.
https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state
|
|
|
|
|
|
|
| |
* gh-95813: Improve HTMLParser from the view of inheritance
* gh-95813: Add unittest
* Address code review
|
|
|
|
|
| |
Support for HtmlParserError was removed back in 2014 with commit
73a4359eb0eb624c588c5d52083ea4944f9787ea, however this small block was
missed.
|
|
|
|
|
| |
Fix typos in the Lib directory as identified by codespell.
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* bpo-41748: Adds tests for unquoted attributes with comma
* bpo-41748: Handles unquoted attributes with comma
* bpo-41748: Addresses review comments
* bpo-41748: Addresses review comments
* Adds more test cases
* Simplifies the regex for handling spaces
* bpo-41748: Moves attributes tests under the right class
* bpo-41748: Addresses review about duplicate attributes
* bpo-41748: Adds NEWS.d entry for this patch
|
|
|
| |
It is deprecated since Python 3.4.
|
|
|
|
|
|
| |
(#2099)
elem is the result of .lower() 6 lines above the handle_endtag call.
Patch by Motoki Naruse
|
|
|
|
|
| |
* Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as *adding* the "r" prefix.
This reverts commit 5ba185039f1bd465d3f82531324fd3fe1ee42f0c.
|
|
|
|
| |
an 'r', like a rawstring. (#1759)
|
|
|
|
|
|
|
| |
And most of the tools.
Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
|
|
|
|
| |
Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
|
|\ |
|
| |
| |
| |
| | |
convert_charrefs is True.
|
| |
| |
| |
| | |
HTMLParser to True. Patch by Berker Peksag.
|
|/
|
|
| |
the HTMLParserError exception have been removed.
|
|\ |
|
| | |
|
| |
| |
| |
| | |
True, automatically converts all character references.
|
| | |
|
| | |
|
|\| |
|
| |
| |
| |
| | |
HTML5 standard.
|
| |
| |
| |
| | |
strict argument of HTMLParser or the HTMLParser.error method are used.
|
|\| |
|
| |
| |
| |
| | |
Barlow.
|
|/ |
|
| |
|
|
|
|
| |
deprecated now that the parser is able to parse invalid markup.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
HTMLParser.
|
|
|
|
| |
``<script>...</script>`` and ``<style>...</style>``.
|
|
|
|
| |
when strict=False.
|
|
|
|
| |
than 128 entities. Patch by Peter Otten.
|
|\ |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r87542 | senthil.kumaran | 2010-12-28 23:55:16 +0800 (Tue, 28 Dec 2010) | 3 lines
Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax
........
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
svn+ssh://pythondev@svn.python.org/python/branches/py3k
................
r81504 | victor.stinner | 2010-05-24 23:46:25 +0200 (lun., 24 mai 2010) | 13 lines
Recorded merge of revisions 81500-81501 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines
Issue #6662: Fix parsing of malformatted charref (&#bad;)
........
r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines
Add the author of the last fix (Issue #6662)
........
................
|
| | |
|
| |
| |
| |
| | |
incorrect syntax
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The motivation for adding this option is that the the functionality it
provides used to be provided by sgmllib in Python2, and was used by,
for example, BeautifulSoup. Without this option, the Python3 version
of BeautifulSoup and the many programs that use it are crippled.
The original patch was by 'kxroberto'. I modified it heavily but kept his
heuristics and test. I also added additional heuristics to fix #975556,
#1046092, and part of #6191. This patch should be completely backward
compatible: the behavior with the default strict=True is unchanged.
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines
Issue #6662: Fix parsing of malformatted charref (&#bad;)
........
r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines
Add the author of the last fix (Issue #6662)
........
|
|
|
|
|
| |
and str (unicode) patterns get full unicode matching by default. The re.ASCII
flag is also introduced to ask for ASCII matching instead.
|