Lmst

@daelba @KathyReid I also recommend #OCR4all. While e-Scriptorium runs on the kraken engine, OCR4all, uses Calamari. Once you have training data in e-Scriptorium, you can also potentially use them to train models in OCR4all. Depending on your discipline, the existing models for e-Scriptorium are 'better' than those for OCR4all or vice-versa, but both tools are highly recommended.

Every now & then, I give #ChatGPT a scan of my handwriting to test its skills in working with #handwrittentexts. Initially, it responded that it could not process the scans or gave me entirely fictional output, but today it got almost everything right. These results are better than those I achieved with #HWR models in #Tesseract & #OCR4all without additional training. I also asked ChatGPT what it "thought" about my writing & it called it "consistently shaped & large with stylistic strokes."

Near-perfect transcription provided by ChatGPT

Original handwritten page uploaded to ChatGPT

Hi #histodons,
I need your expertise. We want to integrate an #opensource #ocr tool into our #useGalaxy Platform so you can better analyse your texts, etc.
I worked with #tesseract some years ago, and I heard about #ocr4all.
Do you have experience with any of these - or other recommendations?
We are also integrating #tranksribus via API but want another ocr-specific option.
Looking forward to your experiences!

@galaxyfreiburg
@NFDI4Memory

Re OCR/ATR, interestingly the #OCR4all paper also offers a very good overview of the different steps and workflows. It has a different purpose, but I think it can still be used in a class context.

Reul, Christian et al. 2019. “OCR4all—An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings.” Applied Sciences 9 (22): 4853. https://doi.org/10.3390/app9224853.

@tkinias@historians.social as far as I understand you want to implement a PDF -> Text -> PDF workflow. Using plaintext as intermediate is problematic, as you (may) lose a lot of layout information.

For high quality fulltext you may need a more sophisticated intermediate format like #PageXML or #AltoXML. But they also require a more sophisticated tool for editing like #OCR4All.

A colleague just asked me about a good, free OCR software for a historical book they are scanning. I was checking out #OCR4all to see if I could recommend it. First thing on the "Getting started" page: A Linux terminal command to start docker … 😵‍💫 I’m not criticizing the project, which I think does important work, but it’s a rather peculiar definition of "all" …

Salut ici :)
Je suis en train de tester #ocr4all pour faire reconnaître de l’écriture manuscrite. ( #ocr #hwr #htr )
Mais j’arrive à rien.
C’est peut-être à cause des modèles ?! Je n’ai que ceux de base qui sont optimisé pour le vieux français … ça aide pas … 😅

Est-ce que quelqu’un a déjà essayé et réussi ??

#question #RT apprécié 😌

@jomla @stabihh Mittlerweile haben wir auf unserem DSRI (Data Science Research Environment) #ocr4all aufgesetzt und der Workflow insgesamt erscheint uns sehr transparent. Allerdings sind wir bei der #Layouterkennung gleich am ersten Dokument gescheitert. Also... "read the docs"!

„Many handwritten sources not digitized. But see Transkribus.“ 🙄 #escriptorium #ocrd #ocr4all #dhd2024

@jomla @stabihh Workshop habe ich leider verpasst. Bin aber interessiert daran, Menschen mit #OCR4all Expertise als Referent*innen nach #Maastricht einzuladen. Hat jemand aus der Community Interesse? Dann gerne PM.

@jomla @stabihh Ich sehe mal wieder keinerlei Antworten auf den Post und hoffe ich frage nicht doppelt: wie waren die Erfahrungen? Ich denke gerade darüber nach, welche #OCR Infrastruktur für mich und meine Fakultät langfristig die beste wäre. Mit #Transkribus arbeite ich gerne, aber #OCR4all hat natürlich #OpenScience Pluspunkte. Allerdings weiß ich noch zu wenig über Anwendungserfahrungen für die #FrüheNeuzeit und freue mich über Austausch.

Weiter gehts: #GoobiTage2023 -- mit #OCR: iterativ oder partizipativ? Sebastian Klaes vom GEI in Braunschweig und Christian Reul aus Würzburg.... Texterkennung als Daueraufgabe und iterativer Prozess - und wie können hier die Bedürfnisse der Nutzer:innen stärker berücksichtigt werden? #OCR4all

Die Tagesordnung für die #GoobiTage2023 ist hier zu finden: https://www.intranda.com/general/vorlaeufige-tagesordnung-und-anmeldung-fuer-die-goobi-tage-2023/ Heute gehts u.a. noch um #OCR mit #OCR4all - gestern gab es Anwender:innen-Vorträge aus Wien vom Bundeskanzleramt und der Akademie der Wissenschaften sowie aus Luxemburg...

Das Projekt „From English in Hong Kong to Hong Kong English: A new diachronic approach to genre and varietal developments in (post)colonial contexts” (Carolin Biewer) wurde von der DFG mit einer Laufzeit von drei Jahren bewilligt. Ziel ist es anhand historischer Briefe und Zeitungen aus Hongkong die Entwicklung dieser Gattungen im soziohistorischen Kontext zu untersuchen. Das @ZPD unterstützt hier mittels #OCR4all sowie bei der weiteren Aufbereitung der Forschungsdaten.

https://www.uni-wuerzburg.de/zpd/news/single/news/dfg-foerdert-projekt-zur-entwicklung-des-englischen-in-hong-kong/

#Day2 of #DH2023 pre-conference workshops. Today I am learning how to use #OCR4All. Hopefully, I can teach and tutor folks at the #UniversityOfOslo later. It could be especially useful for #MedievalManuscripts since we have a couple of projects that require good #OCR #HTR processing!

On his blog, Jonathan Green recommends #OCR4all for early printed books: http://researchfragments.blogspot.com/2023/04/ocr4all-is-good.html #ocr #digitalhumanities #dh (via https://archivalia.hypotheses.org/171036)

Das digitale Editionsprojekt "Artusliteratur aus der Bibliothek des Duc de Nemours" wurde mit einer Laufzeit von drei Jahren (2023-2026) von der DFG bewilligt!

Das ZPD unterstützt dieses Projekt zunächst durch #OCR4all mittels hochqualitativer HTR-Modelle sowie bei der Auszeichnung der daraus entstehenden TEI-Dokumente und spezieller Kommentierungs- und Annotationstools. Abschließend wird die Präsentation dieser digitalen Edition im ZPD-Viewer "Synopticon" erstellt.

https://www.uni-wuerzburg.de/zpd/news/single/news/digitales-editionsprojekt-artusliteratur-aus-der-bibliothek-des-duc-de-nemours-bewilligt/

A first look at the UI of our Workflow Editor which will allow users to craft simple and complex OCR workflow within OCR4all as well as in the stand-alone application NodeFlow. #OCRD #ocr #OCR4all #DigitalHumanities

#OCR4all

Client Info