CMD + K

parsel

Community

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Installation

To install this package, run one of the following:

Conda
$conda install rolando::parsel
Pip
$pip install -i https://pypi.anaconda.org/rolando/simple parsel

Usage Tracking

1.1.0
1.0.1
2 / 8 versions selected
Downloads (Last 6 months): 0

Description

===============================

Parsel

.. image:: https://img.shields.io/travis/scrapy/parsel.svg :target: https://travis-ci.org/scrapy/parsel

.. image:: https://img.shields.io/pypi/v/parsel.svg :target: https://pypi.python.org/pypi/parsel

.. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg :target: http://codecov.io/github/scrapy/parsel?branch=master :alt: Coverage report

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

  • Free software: BSD license
  • Documentation: https://parsel.readthedocs.org.

Features

  • Extract text using CSS or XPath selectors
  • Regular expression helper methods

Example::

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
        <body>
            <h1>Hello, Parsel!</h1>
            <ul>
                <li><a href="http://example.com">Link 1</a></li>
                <li><a href="http://scrapy.org">Link 2</a></li>
            </ul
        </body>
        </html>""")
>>>
>>> sel.css('h1::text').extract_first()
u'Hello, Parsel!'
>>>
>>> sel.css('h1::text').re('\w+')
[u'Hello', u'Parsel']
>>>
>>> for e in sel.css('ul > li'):
        print(e.xpath('.//a/@href').extract_first())
http://example.com
http://scrapy.org

History

1.1.0 (2016-11-22) ~~~~~~~~~~~~~~~~~~

  • Change default HTML parser to lxml.html.HTMLParser <http://lxml.de/api/lxml.html.HTMLParser-class.html>_, which makes easier to use some HTML specific features
  • Add css2xpath function to translate CSS to XPath
  • Add support for ad-hoc namespaces declarations
  • Add support for XPath variables
  • Documentation improvements and updates

1.0.3 (2016-07-29) ~~~~~~~~~~~~~~~~~~

  • Add BSD-3-Clause license file
  • Re-enable PyPy tests
  • Integrate py.test runs with setuptools (needed for Debian packaging)
  • Changelog is now called NEWS

1.0.2 (2016-04-26) ~~~~~~~~~~~~~~~~~~

  • Fix bug in exception handling causing original traceback to be lost
  • Added docstrings and other doc fixes

1.0.1 (2015-08-24) ~~~~~~~~~~~~~~~~~~

  • Updated PyPI classifiers
  • Added docstrings for csstranslator module and other doc fixes

1.0.0 (2015-08-22) ~~~~~~~~~~~~~~~~~~

  • Documentation fixes

0.9.6 (2015-08-14) ~~~~~~~~~~~~~~~~~~

  • Updated documentation
  • Extended test coverage

0.9.5 (2015-08-11) ~~~~~~~~~~~~~~~~~~

  • Support for extending SelectorList

0.9.4 (2015-08-10) ~~~~~~~~~~~~~~~~~~

  • Try workaround for travis-ci/dpl#253

0.9.3 (2015-08-07) ~~~~~~~~~~~~~~~~~~

  • Add base_url argument

0.9.2 (2015-08-07) ~~~~~~~~~~~~~~~~~~

  • Rename module unified -> selector and promoted root attribute
  • Add createrootnode function

0.9.1 (2015-08-04) ~~~~~~~~~~~~~~~~~~

  • Setup Sphinx build and docs structure
  • Build universal wheels
  • Rename some leftovers from package extraction

0.9.0 (2015-07-30) ~~~~~~~~~~~~~~~~~~

  • First release on PyPI.

About

Summary

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Last Updated

Feb 8, 2017 at 03:28

License

BSD License

Total Downloads

260