Package Name | Access | Summary | Updated |
---|---|---|---|
simhash | public | Near-Duplicate Detection with Simhash | 2025-03-25 |
pydepta | public | A Python implementation of DEPTA | 2025-03-25 |
slybot | public | Slybot crawler | 2025-03-25 |
scrapely | public | A pure-python HTML screen-scraping library | 2025-03-25 |
talons | public | Hooks for Falcon | 2025-03-25 |
python-mimeparse | public | A module provides basic functions for parsing mime-type names and matching them against a list of media-ranges. | 2025-03-25 |
frontera | public | A flexible frontier for web crawlers | 2025-03-25 |
websocket-client | public | WebSocket client for python. hybi13 is supported. | 2025-03-25 |
docker-py | public | Python client for Docker. | 2025-03-25 |
shub-image | public | Scrapinghub release tool | 2025-03-25 |
cssselect | public | cssselect parses CSS3 Selectors and translates them to XPath 1.0 | 2025-03-25 |
flatson | public | Tool to flatten stream of JSON-like objects, configured via schema | 2025-03-25 |
scrapylib | public | Scrapy helper functions and processors | 2025-03-25 |
shub | public | Scrapinghub Command Line Client | 2025-03-25 |
retrying | public | Retrying | 2025-03-25 |
hubstorage | public | Client interface for Scrapinghub HubStorage | 2025-03-25 |
scrapinghub | public | No Summary | 2025-03-25 |
pydispatcher | public | Multi-producer-multi-consumer signal dispatching mechanism | 2025-03-25 |
parsel | public | Parsel is a library to extract data from HTML and XML using XPath and CSS selectors | 2025-03-25 |
service_identity | public | Service identity verification for pyOpenSSL. | 2025-03-25 |
pyasn1-modules | public | A collection of ASN.1-based protocols modules. | 2025-03-25 |
characteristic | public | Python attributes without boilerplate. | 2025-03-25 |
scrapy | public | A high-level Web Crawling and Web Scraping framework | 2025-03-25 |