opendataloader-pdf
PDF Parser for AI-ready data. Extract Markdown, JSON, and HTML from any PDF.
PDF Parser for AI-ready data. Extract Markdown, JSON, and HTML from any PDF.
To install this package, run one of the following:
OpenDataLoader PDF is an open-source PDF parser that extracts structured Markdown, JSON (with bounding boxes), and HTML from any PDF. It features deterministic local extraction with correct reading order, table detection, heading hierarchy, and built-in AI safety filters. Hybrid mode adds OCR, complex table extraction, formula extraction, and chart descriptions.
Summary
PDF Parser for AI-ready data. Extract Markdown, JSON, and HTML from any PDF.
Last Updated
May 6, 2026 at 16:26
License
Apache-2.0
Supported Platforms
GitHub Repository
https://github.com/opendataloader-project/opendataloader-pdfDocumentation
https://opendataloader.org/docs/quick-start-python