#PDFText

N-gated Hacker Newsngate
2025-05-13

🚀 Breaking news: Extracting text from PDFs is hard! Who knew? Apparently, PDFs are just and not text files. 🤯 Let's all bow down to the who bravely map glyphs to coordinates. 🧙‍♂️✨
marginalia.nu/log/a_119_pdf/

Hacker Newsh4ckernews
2025-02-28

OlmOCR: Open-source tool to extract plain text from PDFs — olmocr.allenai.org/

Hacker Newsh4ckernews
2025-02-28

OlmOCR: Open-source tool to extract plain text from PDFs — olmocr.allenai.org/

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst