simhash
|
public |
Near-Duplicate Detection with Simhash
|
2025-03-25 |
pydepta
|
public |
A Python implementation of DEPTA
|
2025-03-25 |
slybot
|
public |
Slybot crawler
|
2025-03-25 |
scrapely
|
public |
A pure-python HTML screen-scraping library
|
2025-03-25 |
talons
|
public |
Hooks for Falcon
|
2025-03-25 |
python-mimeparse
|
public |
A module provides basic functions for parsing mime-type names and matching them against a list of media-ranges.
|
2025-03-25 |
frontera
|
public |
A flexible frontier for web crawlers
|
2025-03-25 |
websocket-client
|
public |
WebSocket client for python. hybi13 is supported.
|
2025-03-25 |
docker-py
|
public |
Python client for Docker.
|
2025-03-25 |
shub-image
|
public |
Scrapinghub release tool
|
2025-03-25 |
cssselect
|
public |
cssselect parses CSS3 Selectors and translates them to XPath 1.0
|
2025-03-25 |
flatson
|
public |
Tool to flatten stream of JSON-like objects, configured via schema
|
2025-03-25 |
scrapylib
|
public |
Scrapy helper functions and processors
|
2025-03-25 |
shub
|
public |
Scrapinghub Command Line Client
|
2025-03-25 |
retrying
|
public |
Retrying
|
2025-03-25 |
hubstorage
|
public |
Client interface for Scrapinghub HubStorage
|
2025-03-25 |
scrapinghub
|
public |
No Summary
|
2025-03-25 |
pydispatcher
|
public |
Multi-producer-multi-consumer signal dispatching mechanism
|
2025-03-25 |
parsel
|
public |
Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
|
2025-03-25 |
service_identity
|
public |
Service identity verification for pyOpenSSL.
|
2025-03-25 |
pyasn1-modules
|
public |
A collection of ASN.1-based protocols modules.
|
2025-03-25 |
characteristic
|
public |
Python attributes without boilerplate.
|
2025-03-25 |
scrapy
|
public |
A high-level Web Crawling and Web Scraping framework
|
2025-03-25 |