#pdftohtml

2025-02-04

Just published the preliminary tool #pdf4anki on #codeberg

codeberg.org/barefootstache/pd

It mainly describes how to do it and is a semi-automation tool to get PDFs into #anki.

In the current version one will still need to modify the pattern constant in the clean-html.js file to align with the PDF in use.

#javascript #opensource #jq #pdftohtml #nodejs

2025-02-03

After struggling to get #python #PyMuPDF to work and being close the deadline, I shifted to using a combination of other commands.

First using the #linux #pdftohtml command, which is so much faster than PyMuPDF and packages the result similar to saving a website.

Next with #NeoVim and #RegEx format the #HTML file to be able to be quickly processed with #NodeJs #cheerio and eventually through #json to be saved in #sqlite.

Is it elegant and automatic? No, though it works!

#JavaScript

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst