Turn HTML into equivalent Markdown-structured text.
|Build Status| |Coverage Status| |Downloads| |Version| |Wheel?| |Format| |License|
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: html2text [(filename|url) [encoding]]
+---------------------------------------+------------------------------------+
| Option | Description |
+=======================================+====================================+
| --version
| Show program's version number and |
| | exit |
+---------------------------------------+------------------------------------+
| -h
, --help
| Show this help message and exit |
+---------------------------------------+------------------------------------+
| --ignore-links
| Don't include any formatting for |
| | links |
+---------------------------------------+------------------------------------+
| --escape-all
| Escape all special characters. |
| | Output is less readable, but |
| | avoids corner case formatting |
| | issues. |
+---------------------------------------+------------------------------------+
| --reference-links
| Use reference links instead of |
| | links to create markdown |
+---------------------------------------+------------------------------------+
| --mark-code
| Mark preformatted and code blocks |
| | with [code]...[/code] |
+---------------------------------------+------------------------------------+
For a complete list of options see the
docs <https://github.com/Alir3z4/html2text/blob/master/docs/usage.md>
__
Or you can use it from within Python
:
::
>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.
Or with some configuration options:
::
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
Hello, world!
>>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))
Hello, world!
>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))
Hello, [world](http://earth.google.com/)!
Originally written by Aaron Swartz. This code is distributed under the GPLv3.
html2text
is available on pypi
https://pypi.python.org/pypi/html2text
::
$ pip install html2text
::
PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v
To see the coverage results:
::
coverage combine
coverage html
then open the ./htmlcov/index.html
file in your browser.
Documentation lives
here <https://github.com/Alir3z4/html2text/blob/master/docs/usage.md>
__
.. |Build Status| image:: https://secure.travis-ci.org/Alir3z4/html2text.png :target: http://travis-ci.org/Alir3z4/html2text .. |Coverage Status| image:: https://coveralls.io/repos/Alir3z4/html2text/badge.png :target: https://coveralls.io/r/Alir3z4/html2text .. |Downloads| image:: http://badge.kloud51.com/pypi/d/html2text.png :target: https://pypi.python.org/pypi/html2text/ .. |Version| image:: http://badge.kloud51.com/pypi/v/html2text.png :target: https://pypi.python.org/pypi/html2text/ .. |Wheel?| image:: http://badge.kloud51.com/pypi/wheel/html2text.png :target: https://pypi.python.org/pypi/html2text/ .. |Format| image:: http://badge.kloud51.com/pypi/format/html2text.png :target: https://pypi.python.org/pypi/html2text/ .. |License| image:: http://badge.kloud51.com/pypi/license/html2text.png :target: https://pypi.python.org/pypi/html2text/