High-performance PDF content extraction to Markdown, text, and JSON
copied from cf-post-staging / unpdf-rsunpdf is a high-performance Rust library and CLI tool for extracting content from PDF documents to structured Markdown, plain text, and JSON. It supports PDF 1.0-2.0, including compressed object streams, table detection, image extraction, CJK text, and multiple text cleanup presets for LLM training data preparation.