Lmst

Gemini 3 Flash, 이미지를 확대하고 조작하며 탐색하는 Agentic Vision 공개

Gemini 3 Flash의 Agentic Vision 기능 소개. AI가 이미지를 확대하고 조작하며 능동적으로 탐색해 정확도를 5-10% 높입니다.

Cette semaine sur Oxytude, Hebdoxytude 438, l'actu des nouvelles technologies et de l'accessibilité.

#NVDA #AppSuite #AScan #AGram #SonarVision #GoogleMaps #Gemini #AgenticVision #Visio #MetaRayBan #MentraLive

https://www.oxytude.org

Google Gemini (@GeminiApp)

Agentic Vision을 통해 Gemini의 이미지 이해 능력이 개선되었습니다. Gemini는 프롬프트와 이미지를 바탕으로 다단계 분석 계획(Planning)을 세워 체계적으로 이미지를 해석하고, 세밀한 부분을 발견하면 자동으로 확대(Zooming)하여 세부 정보를 파악합니다. 이러한 기능은 멀티모달 이미지 분석 및 정밀한 시각 인식에 도움을 줍니다.

https://x.com/GeminiApp/status/2016914637523210684

#gemini #agenticvision #vision #multimodal

Google Gemini (@GeminiApp)

Gemini 3 Flash에 새로운 기능 'Agentic Vision' 도입 발표. Agentic Vision은 복잡한 이미지 분석 능력을 강화해 일련번호나 복잡한 도면의 텍스트 같은 세부 정보를 더 정확하고 일관되게 읽어낼 수 있도록 설계됨. 시연을 통해 기능 성능을 확인하라고 안내함.

https://x.com/GeminiApp/status/2016914275886125483

#agenticvision #gemini3 #multimodal #vision #ai

Gemini 3 Flash’s new ‘Agentic Vision’ improves image responses – 9to5google

Gemini 3 Flash’s new ‘Agentic Vision’ improves image responses

Abner Li | Jan 27 2026 – 11:40 am PT

1 Comment

Agentic Vision is a new capability for the Gemini 3 Flash model to make image-related tasks more accurate by “grounding answers in visual evidence.”

Frontier AI models like Gemini typically process the world in a single, static glance. If they miss a fine-grained detail — like a serial number on a microchip or a distant street sign — they are forced to guess.

This new approach “treats vision as an active investigation” by combining visual reasoning with code execution and other tools in the future.

To answer prompts with images, Gemini 3 Flash will formulate “plans to zoom in, inspect and manipulate images step-by-step.” Specifically, Agentic Vision leverages a “Think, Act, Observe loop.”

Think: the model analyzes the user query and the initial image, formulating a multi-step plan.
Act: The model generates and executes Python code to actively manipulate images (e.g. cropping, rotating, annotating) or analyze them (e.g. running calculations, counting bounding boxes, etc).
Observe: The transformed image is appended to the model’s context window. This allows the model to inspect the new data with better context before generating a final response.

Instead of just describing an image it’s given, Gemini 3 Flash “can execute code to draw directly on the canvas to ground its reasoning.” One example of this image annotation in the Gemini app is asking “to count the digits on a hand.”

To avoid counting errors, it uses Python to draw bounding boxes and numeric labels over each finger it identifies. This “visual scratchpad” ensures that its final answer is based on pixel-perfect understanding.

Meanwhile, Gemini 3 Flash will zoom in when it detects fine-grained details in the image. Agentic Vision can also “parse high-density tables and execute Python code to visualize the findings.”

Agentic Vision results in a “consistent 5-10% quality boost across most vision benchmarks” for Gemini 3 Flash.

This is starting to roll out to the Gemini app with the Thinking model. For developers, it’s available today with the Gemini API in Google AI Studio and Vertex AI.

Continue/Read Original Article Here: Gemini 3 Flash’s new ‘Agentic Vision’ improves image responses

#9to5GoogleCom #AgenticVision #ExecuteCode #Gemini #Gemini3Flash #GeminiApp #Google #ImageQuality #NewFromGemini

🚨 Google launches Agentic Vision for Gemini 3 Flash

✅ Think-Act-Observe loop
✅ Python code execution
✅ 5-10% accuracy boost
✅ Available NOW

#AdwaitX #GoogleAI #AI #MachineLearning #DevTools #AgenticVision

https://www.adwaitx.com/google-gemini-3-flash-agentic-vision/

Google AI Developers (@googleaidevs)

Google이 Gemini 3 Flash에서 동작하는 'Agentic Vision' 기능을 Google AI Studio와 Vertex AI에서 제공한다고 발표했습니다. 이 기능은 모델이 코드 실행과 추론을 결합해 일반적인 컴퓨터 비전 작업에서 성능을 향상시키도록 설계되어 비전 파이프라인 자동화와 정확도 개선을 기대할 수 있습니다.

https://x.com/googleaidevs/status/2016224923224588490

#google #gemini #agenticvision #vertexai #computervision

Omar Sanseviero (@osanseviero)

Gemini 3에 Agentic Vision이 도입되었습니다. Gemini가 시각 입력을 기반으로 직접 코드를 작성·실행해 이미지에서 줌, 주석, 검사, 플롯 등을 수행할 수 있으며, 고급 추론 능력과 결합해 멀티모달 시각 분석·조작과 데이터 시각화 등 다양한 혁신적 사용 사례를 지원합니다.

https://x.com/osanseviero/status/2016236082501959783

#gemini #agenticvision #multimodal #vision #ai

#agenticvision

Client Info