Programming Gemini (Agentic Edition)
A working developer's guide to building production agentic systems on the Google Gemini platform. Twenty-five chapters from the Welcome through the final Migration playbook — read in order or jump straight to what you need.
Foundations
Orientation, the platform map, the SDK, thinking, and prompting for thinking models.
Welcome
0/2 publishedWhy this book, who it's for, and how it's organized. The orientation read before Chapter 1.
- 1Welcome to Programming Geminidraft
- 2How to use this bookdraft
Understanding Models
1/5 publishedFive essays on what models actually are: tokens, context, autoregressive generation, multimodality, and the new dimension — reasoning. The conceptual on-ramp before the platform map.
- 1What a language model actually is
- 2Multimodality — one model, many input typesdraft
- 3Reasoning and thinking — the new dimensiondraft
- 4What the dev workflow looks like nowdraft
- 5How Gemini compares — the mental model, not the benchmark tabledraft
Six essays that orient you to the 2026 platform: from model to four-layer stack, the family tree, model selection, pricing/context/tiers, the stability lifecycle, and choosing a surface.
- 1.1From a model to a platform: what changeddraft
- 1.2The model family treedraft
- 1.3Model-selection framework: which model for which jobdraft
- 1.4Pricing, context windows, knowledge cutoffs, tiersdraft
- 1.5Preview vs Stable vs Experimental: naming and deprecationdraft
- 1.6Choosing your surface: AI Studio vs Vertex AI vs Antigravitydraft
Install, authenticate, make your first call, switch surfaces, drop in across languages.
- 2.1Install & environmentdraft
- 2.2Your first request & the response objectdraft
- 2.3Switching AI Studio ↔ Vertex AIdraft
- 2.4OpenAI-compatibility layer (drop-in)draft
- 2.5Languages — Python, Node, Go, Java, RESTdraft
- 2.6Build-an-app-from-a-prompt — AI Studio + Antigravity quickstartdraft
Six essays on the reasoning layer that shapes every Gemini call in 2026.
- 3.1Dynamic reasoning by default — the mental modeldraft
- 3.2`thinking_level` — minimal, low, medium, highdraft
- 3.3Thought signatures and strict validationdraft
- 3.4Cost & latency of thinkingdraft
- 3.5Temperature 1.0 — and what breaks otherwisedraft
- 3.6Legacy `thinking_budget` and the migration pathdraft
Seven essays on getting the right context to the model, cheaply, with thinking-model-aware techniques.
- 4.1Prompting for thinking models vs traditionaldraft
- 4.2System instructions for agentic workflowsdraft
- 4.3Long-context strategies (1M window)draft
- 4.4Context caching for costdraft
- 4.5`media_resolution` for images, video, docsdraft
- 4.6File input patterns — inline, File API, URIsdraft
- 4.7Token counting & cost estimationdraft
Modalities
Text + embeddings, images, video, audio, Omni, documents + RAG, and robotics.
Seven essays on getting useful text out — plus JSON, streaming, common tasks, multilingual, and embeddings.
- 5.1Text generation — Flash vs Prodraft
- 5.2Multi-turn chat & conversation statedraft
- 5.3Structured output — JSON mode & response schemasdraft
- 5.4Streaming responsesdraft
- 5.5Common tasks — summarize, classify, extract, translatedraft
- 5.6Multilingual capabilitiesdraft
- 5.7Embeddings with Gemini Embedding 2draft
Seven essays on understanding and generating images with the Gemini image family.
- 6.1Image understanding + `media_resolution`draft
- 6.2Native image generation — the Nano Banana familydraft
- 6.3Text-to-image & image editingdraft
- 6.4Multi-turn editing & up to 14 reference imagesdraft
- 6.5Resolutions, aspect ratios, search-grounded imagesdraft
- 6.6Imagen 4 — when to use vs Nano Bananadraft
- 6.7SynthID watermarking, safety & policiesdraft
Three essays on video understanding and Veo 3.1 generation with synchronized audio.
- 7.1Video understanding (frame analysis, `media_resolution`)draft
- 7.2Generation with Veo 3.1 / 3.1 Lite (synchronized audio)draft
- 7.3Creative controls & editing; cost / latency / use casesdraft
Six essays on audio understanding, TTS, real-time voice, music, and end-to-end voice agents.
- 8.1Audio understandingdraft
- 8.2TTS with 3.1 Flash TTS — steerable, expressivedraft
- 8.3Real-time voice with 3.1 Flash Live (WebSocket, PCM)draft
- 8.4Partner stacks — LiveKit, Pipecat, Agoradraft
- 8.5Music with Lyria 3draft
- 8.6Build a voice agent end-to-enddraft
Five essays on Omni — the model that takes anything in and outputs video with native audio.
- 9.1The any-to-any modeldraft
- 9.2Prompting Omni — 10-second clips, physics, motiondraft
- 9.3Conversational editing of generated mediadraft
- 9.4Omni vs Veo vs Nano Banana — when to use whichdraft
- 9.5SynthID, safety, availability, pricingdraft
Five essays on document understanding and retrieval-augmented patterns.
- 10.1PDF / document understandingdraft
- 10.2Multimodal embeddings & unified searchin progress
- 10.3RAG system design with Geminiin progress
- 10.4File Search tool for agentsin progress
- 10.5Build a document Q&A systemdraft
Five essays on Robotics-ER 1.6 and the Gemini Robotics SDK.
- 11.1Robotics-ER 1.6 — embodied reasoning overviewdraft
- 11.2Spatial reasoning, pointing, instrument readingdraft
- 11.3Multi-step physical task planning; tool / VLA callsdraft
- 11.4The Gemini Robotics SDK & agent frameworkdraft
- 11.5Integration patterns — trusted-tester notesdraft
Agentic Systems
Function calling, built-in tools, Computer Use, Deep Research, harnesses, multi-agent, and the inter-agent ecosystem.
Eight essays on declaring tools, the call/respond loop, signatures, parallelism, and production patterns.
- 12.1Function declarations (OpenAPI subset)draft
- 12.2The 4-step flow; `from_callable()` auto-generationdraft
- 12.3Function call IDs — pair calls and responsesdraft
- 12.4Multimodal function responsesdraft
- 12.5Streaming function callingdraft
- 12.6Thought-signature circulation in function callingdraft
- 12.7Parallel & compositional callingdraft
- 12.8Error handling, retries; build a real appdraft
Six essays on the tools you don't have to implement — Google ships them inside the model call.
- 13.1Google Search groundingdraft
- 13.2Google Maps groundingdraft
- 13.3Code Executiondraft
- 13.4URL Contextdraft
- 13.5File Search over corporadraft
- 13.6Combining built-in tools with function callingdraft
Six essays on the model that operates a real browser: screenshot, decide, click, observe.
- 14.1How Computer Use works — screenshot, action, loopdraft
- 14.2Supported actions; normalized coordinates (0-999)draft
- 14.3Safety system — allowed / confirm / blockeddraft
- 14.4Sandboxed execution; Playwright integrationdraft
- 14.5Custom functions alongside Computer Use; parallel actionsdraft
- 14.6Build a web-scraping & a form-automation agentdraft
Seven essays on the Deep Research and Deep Research Max agents — long-running, plan-then-execute research.
- 15.1Deep Research vs Deep Research Maxdraft
- 15.2The Interactions APIdraft
- 15.3Collaborative planning — plan, review, approve, executedraft
- 15.4Tools — Search, URL, Code, MCP, File Searchdraft
- 15.5Streaming event types; reconnectiondraft
- 15.6Visualization — agent-generated chartsdraft
- 15.7Use cases — market analysis, due diligence, lit reviewdraft
Seven essays on the harness layer — the loop around the model that makes agents real.
- 16.1What an agent harness isdraft
- 16.2The Antigravity SDKdraft
- 16.3Defining custom agent behaviors; hostingdraft
- 16.4Managed Agents in the Gemini APIdraft
- 16.5Long-running agents via the Interactions APIdraft
- 16.6The Antigravity platform — desktop / CLI / SDKdraft
- 16.7Build a custom-harness agent end-to-enddraft
Seven essays on coordinating many agents — ADK, Agent Engine, memory, MCP.
- 17.1ADK overview — Agent Development Kitdraft
- 17.2Multi-agent orchestration & delegationdraft
- 17.3MCP integration — tools as MCP serversdraft
- 17.4Agent Engine runtime — deploy to Cloud Run / GKEdraft
- 17.5Sessions & Memory Bank (GA) — short and long-term memorydraft
- 17.6Framework interop — LangGraph, LlamaIndex, CrewAI, Vercel AI SDKdraft
- 17.7Build a production multi-agent systemdraft
Six essays on the inter-agent web — Agent Registry, A2A, AP2, Cloud API Registry.
- 18.1Agent Registry — identity, governance, discoverydraft
- 18.2Agent2Agent (A2A) — cross-vendor interopdraft
- 18.3Securing the fleet — permissions, signed agent cardsdraft
- 18.4Cloud API Registry — Google Cloud services as MCP toolsdraft
- 18.5Agent Payments (AP2) — agent-led transactionsdraft
- 18.6Designing for an open agent ecosystemdraft
Production
Prototype → production, evals, safety, performance & cost, and migration.
Five essays on the path from a working prototype to a system you can actually operate.
- 19.1AI Studio → Vertex: auth, keys, service accountsdraft
- 19.2Batch API — large-scale asyncdraft
- 19.3Flex inference vs Priority inferencedraft
- 19.4Context caching at scaledraft
- 19.5Endpoints, monitoring, quotas, rate limitsdraft
Seven essays on moving from vibe-checks to data-driven evaluation of agents.
- 20.1Why evals — from vibe-checks to data-drivendraft
- 20.2Vertex GenAI evaluation service overviewdraft
- 20.3Final-response metricsdraft
- 20.4Trajectory evaluation — the six metricsdraft
- 20.5Building evaluation datasetsdraft
- 20.6Evaluating across frameworks (ADK, LangGraph, CrewAI)draft
- 20.7Evals in CI; regression gates for agentsdraft
Five essays on the safety primitives, filter configuration, and responsible practices.
- 21.1Safety principles, protections, filter configurationdraft
- 21.2Why content was blocked; response strategiesdraft
- 21.3SynthID for generated mediadraft
- 21.4Agent security & AP2 considerationsdraft
- 21.5Responsible-AI practicesdraft
Five essays on tuning Gemini-shaped workloads for cost and latency at scale.
- 22.1Thinking-level tuning for cost / latencydraft
- 22.2`media_resolution` optimizationdraft
- 22.3Token budgeting & countingdraft
- 22.4Caching strategies; batch vs realtimedraft
- 22.5Model selection for costdraft
Four essays on moving across models, vendors, and time — the migration playbooks.
- 23.1Gemini 2.x → 3.x → 3.5 — the migrationin progress
- 23.2OpenAI → Gemini migrationin progress
- 23.3Anthropic → Gemini migrationin progress
- 23.4Common pitfalls & fixesdraft
