unpdf-rs
High-performance PDF content extraction to Markdown, text, and JSON
High-performance PDF content extraction to Markdown, text, and JSON
To install this package, run one of the following:
unpdf is a high-performance Rust library and CLI tool for extracting content from PDF documents to structured Markdown, plain text, and JSON. It supports PDF 1.0-2.0, including compressed object streams, table detection, image extraction, CJK text, and multiple text cleanup presets for LLM training data preparation.
Summary
High-performance PDF content extraction to Markdown, text, and JSON
Last Updated
Apr 15, 2026 at 03:48
License
MIT
Supported Platforms
GitHub Repository
https://github.com/iyulab/unpdfDocumentation
https://docs.rs/unpdf