PDF Parser for AI-ready data. Extract Markdown, JSON, and HTML from any PDF.
copied from cf-post-staging / opendataloader-pdfOpenDataLoader PDF is an open-source PDF parser that extracts structured Markdown, JSON (with bounding boxes), and HTML from any PDF. It features deterministic local extraction with correct reading order, table detection, heading hierarchy, and built-in AI safety filters. Hybrid mode adds OCR, complex table extraction, formula extraction, and chart descriptions.