Google updates Gemini AI to write and run code for image analysis

Google announced Agentic Vision for Gemini 3 Flash on January 27, 2026, turning static image analysis into an iterative agentic process [1]. The model uses a "Think, Act, Observe" loop. It plans multi-step analysis (Think), writes and executes Python code to crop, zoom, annotate, or compute on images (Act), and appends results to its context for review (Observe) before producing a final response [1][2][3].

This design ties reasoning to concrete visual evidence and shifts tasks from probabilistic guessing to deterministic execution. That reduces hallucinations in tasks such as counting digits and parsing dense tables [2][4]. Developers enable Agentic Vision by configuring the code_execution tool in Gemini API calls; the tool supports image URIs and visual scratchpads [1].

Agentic Vision is available now via the Gemini API in Google AI Studio and Vertex AI, with rollout to the Gemini app. Google reports a consistent 5–10% quality boost across most vision benchmarks and a 5% accuracy gain in PlanCheckSolver.com [1][3][4].

Why It Matters

ML engineers can enable code_execution in the Gemini API to gain 5–10% on vision benchmarks without building custom agents.
IT teams can deploy iterative visual reasoning at scale via Vertex AI and Google AI Studio, lowering time to production for vision apps.
Deterministic, Python-based image operations reduce hallucination risk in multimodal tasks like counting and table parsing.
Built-in Think-Act-Observe loops accelerate prototyping and let teams ship vision features in hours rather than weeks.

Google updates Gemini AI to write and run code for image analysis

Why It Matters

Trust & Verification

Sources