#mllm

mecambioaMac :verified:mecambioamac@mstdn.social
2025-12-18

UniGen 1.5: el modelo de IA de Apple que puede ver, crear y editar imágenes

#MLLM #ML #AI

mecambioamac.com/unigen-1-5-el

2025-12-17

Chào mọi người! Công cụ thay thế cục bộ **Super-Bot** vừa được ra mắt: tự viết và chạy mã (Python), tự phục hồi lỗi khi crash, kiểm tra trực quan qua ảnh chụp màn hình. Thử thành công với Ray Tracer và game Snake. Mua một lần thay vì đăng ký hàng tháng! Ý kiến bạn về AI cục bộ vs đám mây?
#AgentStudio #SuperBot #AI #MLLM #ĐạiLýTựĐộng #CôngNghệViệt

reddit.com/r/LocalLLaMA/commen

2025-12-05

Не все чувства одинаково полезны: как искусственный интеллект объединяет информацию из разных источников

Новое исследование показывает, что мультимодальные модели искусственного интеллекта неравномерно полагаются на визуальные и текстовые данные, что может приводить к ошибкам при обработке противоречивой информации.

habr.com/ru/articles/973866/

#kandinsky #gigachat #mllm #сезон_ии_в_разработке

Nick EspinosaNickAEsp
2025-11-10

Why AI Sucks At Telling Time... and why this should concern us for autonomous vehicles and more.

youtu.be/t2Cn0zGRkME

Nick EspinosaNickAEsp
2025-11-10

Daily podcast: Why AI Sucks At Telling Time... and why this should concern us for autonomous vehicles and more.

soundcloud.com/nickaesp/acr

2025-11-06

Kết quả test nhanh cho thấy việc sử dụng Oculink eGPU không gây ảnh hưởng tiêu cực đến hiệu năng khi chạy các dự án LLM cục bộ. Thậm chí, việc kết hợp RTX 3090 qua Oculink với RTX A6000 còn mang lại hiệu suất đáng kể.

#LocalLLaMA #eGPU #Oculink #AI #MLLM #gpu #vietnam
#trítuệnhântạo #máytính

reddit.com/r/LocalLLaMA/commen

2025-10-08

BDH (Baby Dragon Hatchling) đã được port sang MLX cho Apple Silicon! 🚀 Mã nguồn, tài liệu và script huấn luyện đã sẵn sàng. Model này tương thích với M1/M2/M3. Weights sẽ sớm được upload lên Hugging Face.

#LocalLLaMA #MachineLearning #AI #MLLM #HọcMáy #TríTuệNhânTạo

reddit.com/r/LocalLLaMA/commen

2025-10-07

Ước tính chi phí sử dụng token của OpenAI: 1T tokens cho GPT-5 có thể tốn khoảng 3 triệu đô la. Một số lượng lớn người dùng đã tiêu thụ tổng cộng 112,62 triệu đô la tokens, chiếm khoảng 3% doanh thu năm 2024 của OpenAI.
#OpenAI #GPT5 #AI #MLLM # trí_tuệ_nhân_tạo

reddit.com/r/LocalLLaMA/commen

2025-09-24

At the @bifold.berlin conference "AI-based methods in the humanities", I have just attended a great talk by Seid Muhie Yimam of Hamburg University who confirmed my impression that there is a kind of momentum in this area at the moment. He mentioned many datasets, publications and shared tasks on African Languages. I will list them (bit by bit) in this thread.

2/x

#HisTag25 #MLLM #LLM #MultilingualDH

2025-09-17

In der Sektion über #GlobalHistory from a global perspective geht's gerade um die Begrenzungen von LLMs für "low-resourced" languages. Da tut sich allerdings viel - nicht bei OpenAI, Google, Meta & Co. aber andernorts. Ich suche später noch weitere Links, für den Moment muss es dieser tun:

doi.org/10.1038/s43588-025-008

#HistTag25 #MLLM #LLM

Harald KlinkeHxxxKxxx@det.social
2025-09-12

OmniEVA: Bridging the 2D–3D Gap in Embodied AI

New paper introduces OmniEVA, a versatile embodied planner that pushes the boundaries of multimodal large language models (MLLMs) for robotics and spatial reasoning.

Results: OmniEVA achieves state-of-the-art performance across 2D/3D reasoning benchmarks and outperforms existing models in object navigation tasks.

Paper: arxiv.org/pdf/2509.09332v1
Project: omnieva.github.io/

#EmbodiedAI #Robotics #LLM #MLLM #3DVision #AIResearch #AIPlanning

The image illustrates an architecture for a large language model, highlighting the Task-Adaptive Gated Router component. It features connections between text and vision tokens, a ViT encoder, and 3D position encoding. Examples demonstrate how the gated router activates based
2025-07-18

And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)

See you in Vienna! #ACL2025 !

(4/4)

#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc

clacke: exhausted pixie dream boy 🇸🇪🇭🇰💙💛clacke@libranet.de
2025-01-30

"OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole From Us"

Headline of the week. 🥰

OpenAI shocked that an AI company would train on someone else's data without permission or compensation.

404media.co/openai-furious-dee… (no-charge subscription wall for full article)

#OpenAI #DeepSeek
#AI #LLM #MLLM
#GenAI #GenerativeAI

2024-12-19

Survey: Multimodal #LLMs like GPT-4V are redefining AI by excelling at tasks like image-based storytelling & OCR-free math reasoning, hinting at AGI potential. This paper reviews their progress, architectures, and challenges while exploring new horizons for research. 🚀 #AI #MLLM

王永帥🍥yongshuai1013
2024-11-11

一款開源的多模態影片理解大模型:PPLLaVA,它提供了一個統一的框架,可以同時有效處理長、短片任務
1k上下文,性能優於LLaVA-Next-Video 8k上下文,吞吐率提高了8倍

它通過解決影片冗餘問題,實現了統一處理的能力,可以處理從幾秒到幾小時的各種長度影片

支持影片內容理解和分析、影片場景描述、影片問答交互和多輪對話推理

專案地址: github.com/farewellthree/PPLLaVA

Stefan Müller :verified:stefanmuelller@climatejustice.social
2024-11-09

@peer Dann wäre das #LLM ein Multimodal Large Language Model (#MLLM). Genau das ist der Punkt: Wenn Du in einem Raum sitzt, in dem chinesisches Radio läuft, lernst Du nicht Chinesisch. Ein LLM schon. Es lernt ganz anders als wir. Es lernt nur die Distribution von Sprachteilen. Wir lernen mit Grounding. Das wird in der KI auch kommen, aber jetzt ist es noch nicht so weit und deshalb sind die LLMs noch nicht der Beweis, dass Chomsky falsch lag, aber 1) wussten wir das schon vor den LLMs und 2) machen die LLMs das auch für Laien und hardcore Chomskyaner (die vorher einfach die Literatur nicht gelesen hatten) plausibel.

TheTransmittedthetransmitted
2024-04-11

Дослідники Apple представили Ferret-UI, нову модель штучного інтелекту, яка значно покращує розуміння та взаємодію з мобільними інтерфейсами.

thetransmitted.com/ai/doslidny

Victoria Stuart 🇨🇦 🏳️‍⚧️persagen
2024-04-10

Great blog: stratechery.com/about/
... interesting article:

Gemini 1.5 and Google’s Nature
stratechery.com/2024/gemini-1-


(Google DeepMind, successor to Google Bard | see also: GPT-4 (OpenAI)
(multi-modal large language models)
(retrieval-augmented generation)

2024-03-31

Like, don’t get me wrong, I’d be excited if Apple came out with some LLM or MLLM stuff for iOS18, but I don’t want them to do it just because everyone else is doing it.

To be honest, the only thing I REALLY want is for Siri to not be so….dumb. 🤷🏼‍♂️

#Apple #iOS18 #Siri #LLM #MLLM

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst