fastpdf2png: PDF to PNG at 1,500 pages/s with SIMD and PDFium
What My Project Does I was working on a document extraction pipeline and got frustrated with how slow PDF to PNG conversion was. PyMuPDF, MuPDF, ImageMagick, none of them were fast enough when you'...

Source: DEV Community
What My Project Does I was working on a document extraction pipeline and got frustrated with how slow PDF to PNG conversion was. PyMuPDF, MuPDF, ImageMagick, none of them were fast enough when you're processing thousands of documents. So I wrote fastpdf2png. It uses PDFium (the PDF engine from Chrome) under the hood, with a custom PNG encoder that uses SIMD instructions and a patched compression library. It also detects when a page is grayscale and outputs 8-bit PNGs automatically. pip install fastpdf2png import fastpdf2png images = fastpdf2png.to_images("doc.pdf", dpi=150, workers=4) Target Audience Anyone dealing with PDFs at scale. Data pipelines, ML preprocessing, document management, that kind of thing. Comparison I benchmarked everything I could find at 150 DPI, single process. fastpdf2png does 323 pg/s, MuPDF does 37, PyMuPDF 30, and ImageMagick 2.9. With 8 workers it gets to about 1,500 pg/s. Output files end up smaller too because of the grayscale detection. https://github.com