Lmst

Поиск в pdf-файлах через #pdfgrep с использованием всех процессоров/ядер на компе:
find /mnt/docs -name '*.pdf' | parallel pdfgrep -Ri 'искомая подстрока'

Чуть детальнее и полезнее выглядит если:
... | parallel -j15 pdfgrep -HRiF --color=always 'искомая подстрока'

-j15 — работать используя не более 15 ядер/ЦПУ.
-H — выводить таки имя файла где же нашлось искомое.
-F — искать именно подстроку, вместо RegEx выражения.

parallel — это тот, который GNU parallel,
pdfgrep — тот что #^https://pdfgrep.org

Для примера, обычный поиск по текстовым файлам:
parallel -j15 grep -HRiF --color=always 'искомая строка' ::: /mnt/docs

#grep #pdfgrep #parallel #linux #shell #lang_ru @Russia @ru

In #linux you can search in #pdf files via #pdfgrep. Thi is useful for searching one phrase among multiple pdf files:
pdfgrep mahdi 1.pdf 2.pdf

#ripgrepall is a #grep for everything.

#rga performs regex searches on many different container filetypes, including #PDF, #zip, #tar, #SQLite, and #docx. rga uses #ripgrep for searching, which makes searches extremely fast, allowing it to outperform #pdfgrep in speed (some matches are missed, however). rga performs caching for faster repeat searches.

Website 🔗️: https://github.com/phiresky/ripgrep-all

#free #opensource #foss #fossmendations

#ripgrep is even faster when searching in #PDF files than #pdfgrep + #parallel
https://karl-voit.at/2021/01/11/pdfgrepp/
#publicvoit

#pdfgrep is a #PDF file searcher.

pdfgrep is a search program that looks for #regex patterns in the text portions of a PDF file. pdfgrep is largely compatible with GNU #grep, supporting recursing directories, case insenstive regex, colorized output, etc. pdfgrep can also search in password protected PDF files.

Website 🔗️: https://pdfgrep.org/

apt 📦️: pdfgrep

#free #opensource #foss #fossmendations

#pdfgrep

Client Info