I hit a bug or issue with libxml2, Python 3, and feedparser:
Traceback (most recent call last): File "/home/livibetter/bin/example.py", line 70, in <module> main() File "/home/livibetter/bin/example.py", line 30, in main fd = fp.parse('http://example.com/feed/') File "/usr/lib64/python3.3/site-packages/feedparser.py", line 3987, in parse saxparser.parse(source) File "/usr/lib64/python3.3/site-packages/drv_libxml2.py", line 190, in parse _d(reader.LocalName())) File "/usr/lib64/python3.3/site-packages/drv_libxml2.py", line 70, in _d return _decoder(s)[0] File "/usr/lib64/python3.3/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) TypeError: 'str' does not support the buffer interface
Note
some text in the traceback is edtied.
By default, feedparser uses, from 5.1.3 source code, a SAX driver called drv_libxml2 to parse and it’s the one causing the issue:
# List of preferred XML parsers, by SAX driver name. These will be tried first, # but if they're not installed, Python will keep searching through its own list # of pre-installed parsers until it finds one that supports everything we need. PREFERRED_XML_PARSERS = ["drv_libxml2"]
The driver from libxml2-2.9.1 doesn’t seem to work with Python 3. To work around it, simply remove that driver from the list:
import feedparser as fp fp.PREFERRED_XML_PARSERS.remove('drv_libxml2') # or more drastic, replace with an empty list fp.PREFERRED_XML_PARSERS = [] # or empty the list, if you seriously insist del fp.PREFERRED_XML_PARSERS[:]
The error is gone. I don’t know what parser is actually used after, or if there is a performance drop, but as long as the script runs fine, I am okay with that.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.