#PDFs

2025-10-31

@tallison ran Tika on my pile of PDFs and now I see the “pdfa:PDFVersion” field and just for the first result file I’m seeing PDF/A results for 193 files out of 10000. I have 17 more sets of results to go through, Im guessing the others will be similar. Fascinating! #pdfs #digipres

2025-09-30

Ran pdfcpu -relaxed on my pile of 175K+ #pdfs and there were 1762 files with validation errors here’s a sample. Thanks to @mickylindlar for suggesting pdfcpu, now I just need to make sense of the results. #digipres #digitalpreservation

validation error (obj#:968): postScriptCalculatorFunctionStreamDict: unsupported in version 1.2 validation error (obj#:1): pdfcpu: validateIndRefArrayEntry: invalid type at index 0 validation error (obj#:90): pdfcpu: validateOutlineTree: empty outline item dict "Count" must be 0 validation error (obj#:9): dict=extGStateDict entry=HT (obj#9): unsupported in version 1.1 validation error (obj#:58): pdfcpu: validateIndRefArrayEntry: invalid type at index 0 validation error (obj#:21): dict=pagesDict entry=Tabs: unsupported in version 1.2 validation error (obj#:746): dict=fileSpecDict entry=Thumb: unsupported in version 1.6 validation error (obj#:452): dict=outlineItemDict required entry=Parent missing
2025-09-19

Just finished the run and nearly 9K #pdfs were ‘Well formed, but not valid’ when using text output and with json there were only 3. @dpc_chat #JHOVE #digipres #digitalpreservation #OpenPreservationFoundation

⚯ Michel de Cryptadamus ⚯cryptadamist@universeodon.com
2025-09-18

Released v1.17.0 of The Pdfalyzer, the surprisingly popular tool for analyzing (possibly malicious) PDFs I created after my own unpleasant experience. Now ships with two command line tools for extracting stuff from PDF files:

1. extract_text_from_pdfs() - brute force extract all text from a PDF, including doing an #OCR extraction of any embedded images

2. extract_pdf_pages() - rip a page range from a #PDF and write them to a new one

* Github: github.com/michelcrypt4d4mus/p
* Pypi: pypi.org/project/pdfalyzer/
* Homebrew: formulae.brew.sh/formula/pdfal
* Fun thread someone made last week using Pdfalyzer to explain some of how byzantine the PDF format is: x.com/VikParuchuri/status/1965

#pypi #python #pdf #pdfs #malware #Threatassessment #maldoc #malwareanalysis #homebrew #infosec #cybersecurity #yararule #PdfFies

Github repo screenshot
Soft & Appssoft_apps
2025-09-17

✅ ¿Cansado de aplicaciones pesadas para editar ?

CanaryPDF: un kit de herramientas GRATUITO y seguro que funciona en tu .

Edita PDFs, extrae imágenes y tablas. SIN instalar nada y SIN registro.

Tus archivos NUNCA se suben a internet.

➡️ softandapps.info/2025/09/17/ca

N-gated Hacker Newsngate
2025-09-12

Oh boy, another package! 🤣 Get ready for a thrilling ride through "The Companion," where you'll find edge-of-your-seat excitement like... used with spotcolor! 🎨📄 Strap in for the ultimate in monotony! 😂
ctan.org/pkg/tlc3-examples

2025-09-09

Tip of the day: When doing research and taking notes, it is often helpful to link to specific parts of text in documents, especially #PDFs. #DEVONthink has two special copy and paste commands that make such linking very fast and effective. #notetaking #pkm #productivity #tipoftheday #workflow devontechnologies.com/blog/202

R.L. Dane :Debian: :OpenBSD: 🍵 :MiraLovesYou:rl_dane@polymaths.social
2025-09-08

I really wish there was a keyboard-driven #PDF viewer like #Zathura, #MuPDF, or #SioYek that let you fill out forms and annotate #PDFs.

That would be da bomb.

2025-09-08

@BertrandCaron you post is a good reminder, I need to run #JHOVE on my pile of 175K+ #PDFs, I’ve run veraPDF it would be nice to compare the results.

Verfassungklage@troet.cafeVerfassungklage@troet.cafe
2025-09-07

#LinuxGuides:

#Büro-Software für #Linux #Mint - E-Mails, #Office, #PDFs & Software - Bye Windows 10! Teil 3/5

m.youtube.com/watch?v=Wi_Tw1p-

Kevin Karhan :verified:kkarhan@infosec.space
2025-09-04

@Krustinaut @stuttgart +1

IMHO sollten alle Informationen staatlicherseits frei verfügbar sein außer es gäbe hinreichend triftige Gründe warum dies nicht sein sollte (bspw. halte ich es für falsch Notrufe per "#SunshineLaw" öffebtlich gerfügbar zu machen)…

File upload vulnerability sucks. Check those PDFs folks. And check out Dider Stephens at SANS for the latest research.

Oh, and welcome back, Decipher!

decipher.sc/2025/08/04/new-pxa

#malware #pdfs

2025-08-21

I am still building my Ko-Fi store up, but you can now get the entire five volumes of Gemutations: Plague, SpiderWarrior, Michael: The Cause Vol 1 and a few Only Half Saga PDF's! Check it out! #spiderforest #webcomics #pdfs ko-fi.com/darwincomics/shop

Hacker Newsh4ckernews
2025-08-17
2025-08-10

CVE-2025-48708: #ghostscript can embed plaintext #password in encrypted #PDFs 😶

openwall.com/lists/oss-securit

R.L. Dane :Debian: :OpenBSD: 🍵 :MiraLovesYou:rl_dane@polymaths.social
2025-08-09

New #blog post: Desperately Seeking Preview.app

https://rldane.space/desperately-seeking-previewapp.html

361 words

Kind of a follow-up to yesterday's blost, but also informative for those who work with PDFs in Linux.

Thanks to https://infinitemac.org for enabling me to get the screenshot of Preview.app on NeXTStep 1.0. So awesome!!!!

cc: my wonderful #chorus: @joel @dm @sotolf @thedoctor @pixx @orbitalmartian @adamsdesk @krafter @roguefoam @clayton @giantspacesquid @Twizzay @stfn

(I will happily add/remove you from the chorus upon request! :)

P.S., I really hope someone gets the 1980s movie reference. Even though I've never seen said movie. 😄

P.P.S., @scruss informs me that Firefox's built-in PDF viewer can do (almost) all of the annotation things I've been trying to do with multiple apps. I'm kinda shook!

#100DaysToOffload #50 #FIFTY! #HalfwayThere

#rlDaneWriting #blost #Macintosh #NeXT #NeXTStep #Retrocomputing (a little) #InfiniteMac #PreviewApp #PDF #PDFs #Linux

2025-08-08

Out of these 175K+ files #veraPDF indicatesd that 85 were ‘invalid PDFs’. #JHOVE thinks that 31 of these 85 #pdfs are valid. Hmmm… #digipres #digitalpreservation

2025-08-08

Tip of the day: Smart groups are a good way to view items matching specific criteria, like all the flagged #PDFs in your #DEVONthink database. While it’s easy enough to make a smart group, if you find yourself doing the same search over and over, you can use it to create a smart group in a few clicks. #pkm #productivity #tipoftheday #workflow devontechnologies.com/blog/202

2025-08-06

Tip of the day: If you have a need to watermark or put identifying information on #PDFs or images, the Pro and Server editions of #DEVONthink can do this for you. It’s called imprinting and we show you how you set up and use it. #paperless #pdf #pkm #productivity #tipoftheday #workflow devontechnologies.com/blog/202

2025-08-05

Here are the results of running veraPDF ua1 against 175K PDFs from our docs repository. I also have a text file with all of the error text to go through. VeraPDF made a ~15GB json file. Accessibility rules 7.1 and 7.2 were our worst. I need to rerun this against ua2. #pdfs #pdf #accessibility #digitalpreservation

Multi-color bar chart of tag counts by accessibility rule. The legend lists the tags and color.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst