Paper 👉 https://internvl.github.io/blog/2025-04-11-InternVL-3.0/
Code 👉 https://github.com/OpenGVLab/InternVL
#MultimodalAI #MLLM #OpenSourceAI #VisionLanguage #AIResearch
#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️
After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀
Key Features of Qwen2-VL:
1. 🖼️ Image Understanding:
Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.
2. 🎬 Video Analysis:
Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis
3. 🤖 Device Integration:
The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱
4. 🌍 Multilingual Capabilities:
Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI
This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.
#Inaturalist has released a new #VisionLanguage #AI tool that let you search the huge Inaturalist foto pool in natural language like "Bird eating fruit" https://www.inaturalist.org/blog/95911
Florence-2: a vision foundation model that excels in a variety of computer vision and vision-language tasks through a unified, prompt-based approach. Unlike existing models, Florence-2 interprets text prompts to deliver results in tasks like captioning, object detection, grounding, and segmentation.
#AI #ComputerVision #MachineLearning #VisionLanguage
https://arxiv.org/abs/2311.06242