#MLLMs

2025-06-13

MATP-BENCH: Can MLLM be a good automated theorem prover for multimodal problems? ~ Zhitao He et als. arxiv.org/abs/2506.06034 #AI #MLLMs #Math #ITP #IsabelleHOL #LeanProver #CoqProver #AIforMath

Alex JimenezAlexJimenez@mas.to
2024-12-28

Collective Monte Carlo Tree Search (CoMCTS): A New Learning-to-Reason Method for Multimodal Large Language Models

marktechpost.com/2024/12/27/co

#LLMs #MLLMs #AI

Harald KlinkeHxxxKxxx@det.social
2024-02-07

If you would like to learn more how it works: Guiding Instruction-based Image Editing via Multimodal Large Language Models. Check out the code repository for the ICLR'24 Spotlight paper by Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, and Zhe Gan.
github.com/apple/ml-mgie
#ICLR24 #ImageEditing #MLLMs #AIResearch

A diagram illustrating the process of a multimodal large language model (MLLM) editing an image of a cabin in the woods to place it in a desert setting.
GeregoGerego
2023-12-24

Apple has released Ferret, a new type of multimodal large language model (MLLM) that excels in both image understanding and language processing, particularly demonstrating significant advantages in understanding spatial references.

Paper: arxiv.org/abs/2310.07704
Github: github.com/apple/ml-ferret?tab

Source: threads.net/@luokai/post/C1OE1

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst