Lmst

DeepSeek OCR 2, 비주얼 토큰 80% 줄이고 Gemini 3 Pro 능가

DeepSeek OCR 2가 비주얼 토큰을 80% 줄이면서도 Gemini 3 Pro를 능가한 비결. 의미 기반 이미지 재배열로 문서 AI의 새 지평을 엽니다.

Prince Canuma (@Prince_Canuma)

mlx-vlm v0.3.11이 공개되었습니다. 새 모델로 GLM-OCR(@Zai_org) 지원이 추가되었고, Qwen2-VL 어텐션 마스크 수정, SmolVLM의 dynamic n_kv_heads, PaddleOCR 프로세서 수정 등 여러 결함 픽스와 모델 로드 실패 시 향상된 에러 로깅이 포함되어 있습니다. 또한 새 기여자(@hturbe, @mikolaj92)를 환영합니다.

https://x.com/Prince_Canuma/status/2019142731365532025

#mlxvlm #vlm #ocr #opensource #qwen

Abhishek Yadav (@abhishek__AI)

GLM-OCR은 약 0.9B(9억) 파라미터의 경량 모델임에도 불구하고 문서 이해에서 SOTA 수준의 성능을 보인다고 평가됩니다. 특히 표 처리에 강하고 정보 추출이 깔끔하며 수식 인식도 견고해 문서 OCR 및 구조화된 정보 추출용으로 '빠르고 효율적인' 솔루션으로 소개되고 있습니다.

https://x.com/abhishek__AI/status/2018764025778286970

#glmocr #ocr #documentunderstanding #nlp #computervision

cedric (@cedric_chee)

GLM-OCR이 Gemini 3 Flash Thinking(중간 모드)과 비교해 더 빠르다고 주장하며 구체적 벤치마크를 제시합니다. 예시로 scribble→doctor notes 13.5s, tables→water damage paper 40.8s(마크다운 변환 시 약 20s) 등을 들며 GLM-OCR의 우수한 속도를 강조합니다.

https://x.com/cedric_chee/status/2018620579583255005

#glmocr #benchmark #gemini #ocr

cedric (@cedric_chee)

GLM-OCR은 실제 환경(real-world)과 높은 처리량(high throughput)을 염두에 두고 최적화된 OCR 솔루션(모델)로 소개되며, 현재 테스트 중이라는 공지입니다. 높은 처리량과 실무 적용 가능성을 강조한 OCR 신제품/도구 발표 성격의 트윗입니다.

https://x.com/cedric_chee/status/2018525327346270539

#glmocr #ocr #computervision #ai

Novita AI (@novita_labs)

GLM-OCR을 Zai에서 발표하고 Novita 플랫폼에 손쉽게 배포할 수 있다고 안내합니다. 0.9B 파라미터의 멀티모달 OCR로 실무용 대규모 OCR에 적합하며 OmniDocBench V1.5에서 1위(94.62)를 기록했다고 소개합니다. 표, 수식, 코드, 인감, 복잡 레이아웃 처리가 강점이라고 명시합니다.

https://x.com/novita_labs/status/2018565896013574225

#glmocr #zai #novita #omnidocbench #ocr

Z.ai (@Zai_org)

GLM-OCR이라는 문서 이해 특화 모델이 공개되었습니다. 약 0.9B 파라미터로 설계되어 수식 인식, 표 인식, 정보 추출 등 복잡한 문서 이해 벤치마크에서 SOTA 성능을 달성했다고 보고하고 있습니다. 경량 모델로 실무 문서 처리에 최적화된 점을 강조합니다.

https://x.com/Zai_org/status/2018520052941656385

#glmocr #ocr #documentunderstanding #multimodal

Abhishek Yadav (@abhishek__AI)

Hugging Face에 GLM-OCR 가중치가 공개되었다는 안내와 함께 데모(ocr.z.ai) 및 API 문서 링크가 제공됨. VLM 방식의 GLM-OCR 모델을 실험·적용할 수 있도록 weights, 데모, 개발자 가이드가 정리되어 있는 릴리스 안내성 트윗.

https://x.com/abhishek__AI/status/2018573644419719383

#huggingface #glmocr #ocr #zai

Abhishek Yadav (@abhishek__AI)

GLM-OCR이 매우 인상적이라는 평입니다. 파라미터 수가 0.9B에 불과함에도 문서 이해에서 SOTA 수준의 성능을 보이며 특히 표 처리, 정보 추출, 수식 인식에 강점을 보인다고 합니다. 경량·고속 문서 AI 모델의 유망 사례로 소개됩니다.

https://x.com/abhishek__AI/status/2018573637515915490

#glmocr #ocr #documentunderstanding #sota #smallmodels

I have an interesting issue with the #tesseract #OCR command line tool on Ubuntu 24.04.

The tool detects text more reliably if I convert my JPG images to TIFF first.

Simply using imagemagicks convert orig.jpg ocr.tiff improves the results reliably.

Anyone know why?

#OCR funktioniert 👍

Bestätigung einer erfolgreichen Erhöhung der Scan-Auflösung auf 288 DPI für eine verbesserte Texterkennung. Ein automatischer Hintergrundprozess verarbeitet nun alle Dokumente neu, ohne dass weiteres Handeln erforderlich ist.

#Công Nghệ #QuảnLýTàiChính #OCR
Một dự án quét hóa đơn tập trung vào quyền riêng tư đã gặp khó khăn với OCR hình ảnh. Mặc dù xử lý PDF và sao kê ngân hàng tốt, OCR lại đọc sai số (ví dụ $4.86 thành 84.86) và tên cửa hàng. Cần help từ dev hoặc tester! #PrivacyTech #Python #HóaĐơn #OpenSource

Tác giả: Delicious_Garden5795
Nguồn: Reddit/SideProject

https://www.reddit.com/r/SideProject/comments/1qsbnl9/i_built_a_privacy_focused_receipt_scanner_that/

i force extracted the text from / applied color highlighting of interesting keywords to (a lot of) the latest batch of #epstein files. still having issues downloading some of the collections from the DOJ website but this has data sets 1, 3, 4, 5, 6, 7, and 12 so far. will add the rest as i get them.

fair warning that the generated HTML is a) pretty rough with #OCR artifacts / generally pretty messy and b) kind of huge (almost 100mb) sp your web browser might struggle a bit (especially if you're using a phone) but it’s here for those of you who might want to start doing a little ctrl-F action.

https://michelcrypt4d4mus.github.io/epstein_text_messages/doj_2026-01-30_files.html

#EpsteinFiles #uspol #JeffreyEpstein #Trump #corruption #LeonBlack #howardLutnick

EFTA02731636 (epsteinify) (Epsteiniieb) (DOJ)
From:
Sent:
To:
Ce:
Subject:
Wednesday. June 28. 2023 10:16 PM
RE: Leon Black/Additional HT Subject Referral -- Update
Thanks very much. I'm swamped tomorrow, but can we talk Friday? I'm open all afternoon.
From:
Sent: Wednesday, June 28, 2023 2:03 PM
To:
cet
Subject: FW: Leon Black/Additional HT Subject Referral —- Update
I spoke wither again and got a read-out of their interviews of the victim. Let's touch base when you have time.
Victin's identifiers:
Potential targets:
From:
Sent: Monday, June 12, 2023 1:53 PM
To:
Subject: Leon Black/Additional HT Subject Referral -- Update
Al,
I relayed the info to Jeanne Christensen per my conversation with (that we are deconflicting with and
would circle back with her). I'm attaching notes of today's conversation with counsel, notes of the initial referral call,
and the email correspondence I've had with counsel for your records. Let me know if there's anything else you'd like me
to do with this (i.e., if you'd like me to communicate anything else to Jeanne after we receive notes from re: the
minor victim).
Thanks,
Assistant United States Attorney
Southern District of New York
Phone:
Email
1
Page 1, Image 1
From:
Sent: Wednesday June 28, 202310:16 PM
T
ce:
Subject: RE: Leon Black/Additional HT Subject Referral —- Update
“Thanks very much. I'm swamped tomorrow, but can we talk Friday? I'm open all afternoon.
From:
Sent: Wednesday, June 28, 2023 2:03 PM

LMSYS Org (@lmsysorg)

DeepSeek-OCR 2가 Visual Causal Flow를 도입하고 SGLang으로 실행 가능해졌습니다. DeepEncoder V2를 활용해 기존의 좌상→우하 고정 스캔 대신 이미지 내용 기반으로 시각 토큰 순서를 재정렬하고 단계별(스텝별) 비주얼 처리를 수행하여 OCR 및 비전-언어 처리의 유연성과 정확도 향상이 기대됩니다.

https://x.com/lmsysorg/status/2017305889490014281

#deepseek #ocr #visionlanguage #sglang

Paperless-ngx

"Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper."

https://docs.paperless-ngx.com/#features

#documents #ocr #paperless #scanning #tools

Auktionskataloge sind wichtige Quellen für die historische Sammlungsforschung. Mit welchen digitalen Methoden diese Quellen untersucht werden können, lesen Sie in dem Beitrag von Maximilian Görmar, der seinen Fokus auf den Sammler Johann Gottfried Lakemacher richtet:

https://doi.org/10.17175/2026_001

#digitalHumanities #sammlungsforschung #ocr #textmining @hab_wf @MWWForschung @pmgoerma

PaddleOCR‑VL 1.5 vừa được phát hành, cập nhật mạnh mẽ với cải tiến hiệu năng nhận dạng văn bản & hình ảnh. Đây là tin tốt cho cộng đồng AI và Computer Vision! #PaddleOCR #OCR #ComputerVision #AI #NhậnDạngVănBản #CôngNghệ #AIcôngNghệ #VisionAI

https://www.reddit.com/r/LocalLLaMA/comments/1qr5hij/paddleocrvl_15/

RE: https://mastodon.social/@gutenberg_org/115983651663698761

The method has nothing to do with the marketing slopWord AI. It is #ML = #MachineLearning which is often used in science. Better read the study.

#medieval #medievistodon #medievists #histodon #bookstodon #academicChatter #manuscripts #digitalization #OCR

RAVI KUMAR SAHU (@RAVIKUMARSAHU78)

바이두(Baidu)가 1월 29일에 PaddleOCR-VL-1.5를 오픈소스로 공개했습니다. 0.9B 파라미터의 멀티모달 OCR로 OmniDocBench v1.5에서 전 세계 1위를 기록(94.5% 정확도)해 DeepSeek-OCR2를 능가했으며, 오픈소스·생산환경 적용 가능성 등이 주목되는 발표입니다.

https://x.com/RAVIKUMARSAHU78/status/2017089520290975953

#baidu #paddleocr #ocr #opensource #benchmark

Aryan Rakib (@tec_aryan)

바이두가 문서 AI 분야의 진전을 알리며 PaddleOCR-VL-1.5를 오픈소스로 공개했습니다. 9억 파라미터급 모델로 OmniDocBench V1.5에서 전 세계 1위(94.5% 정확도)를 달성, 기존 모델들을 제치며 문서 인식용 멀티모달 OCR의 중요한 개선을 보여줍니다.

https://x.com/tec_aryan/status/2017120751099527268

#paddleocr #ocr #baidu #opensource #documentai

#OCR

Client Info