GotGemini
The book

Programming Gemini (Agentic Edition)

A working developer's guide to building production agentic systems on the Google Gemini platform. Twenty-five chapters from the Welcome through the final Migration playbook — read in order or jump straight to what you need.

143 essays · 1 published · 4 parts · 25 chapters
Part I

Foundations

Orientation, the platform map, the SDK, thinking, and prompting for thinking models.

Welcome

0/2 published

Why this book, who it's for, and how it's organized. The orientation read before Chapter 1.

  • 1Welcome to Programming Geminidraft
  • 2How to use this bookdraft

Five essays on what models actually are: tokens, context, autoregressive generation, multimodality, and the new dimension — reasoning. The conceptual on-ramp before the platform map.

  • 1What a language model actually is
  • 2Multimodality — one model, many input typesdraft
  • 3Reasoning and thinking — the new dimensiondraft
  • 4What the dev workflow looks like nowdraft
  • 5How Gemini compares — the mental model, not the benchmark tabledraft

Six essays that orient you to the 2026 platform: from model to four-layer stack, the family tree, model selection, pricing/context/tiers, the stability lifecycle, and choosing a surface.

  • 1.1From a model to a platform: what changeddraft
  • 1.2The model family treedraft
  • 1.3Model-selection framework: which model for which jobdraft
  • 1.4Pricing, context windows, knowledge cutoffs, tiersdraft
  • 1.5Preview vs Stable vs Experimental: naming and deprecationdraft
  • 1.6Choosing your surface: AI Studio vs Vertex AI vs Antigravitydraft

Install, authenticate, make your first call, switch surfaces, drop in across languages.

  • 2.1Install & environmentdraft
  • 2.2Your first request & the response objectdraft
  • 2.3Switching AI Studio ↔ Vertex AIdraft
  • 2.4OpenAI-compatibility layer (drop-in)draft
  • 2.5Languages — Python, Node, Go, Java, RESTdraft
  • 2.6Build-an-app-from-a-prompt — AI Studio + Antigravity quickstartdraft

Six essays on the reasoning layer that shapes every Gemini call in 2026.

  • 3.1Dynamic reasoning by default — the mental modeldraft
  • 3.2`thinking_level` — minimal, low, medium, highdraft
  • 3.3Thought signatures and strict validationdraft
  • 3.4Cost & latency of thinkingdraft
  • 3.5Temperature 1.0 — and what breaks otherwisedraft
  • 3.6Legacy `thinking_budget` and the migration pathdraft

Seven essays on getting the right context to the model, cheaply, with thinking-model-aware techniques.

  • 4.1Prompting for thinking models vs traditionaldraft
  • 4.2System instructions for agentic workflowsdraft
  • 4.3Long-context strategies (1M window)draft
  • 4.4Context caching for costdraft
  • 4.5`media_resolution` for images, video, docsdraft
  • 4.6File input patterns — inline, File API, URIsdraft
  • 4.7Token counting & cost estimationdraft
Part II

Modalities

Text + embeddings, images, video, audio, Omni, documents + RAG, and robotics.

Seven essays on getting useful text out — plus JSON, streaming, common tasks, multilingual, and embeddings.

  • 5.1Text generation — Flash vs Prodraft
  • 5.2Multi-turn chat & conversation statedraft
  • 5.3Structured output — JSON mode & response schemasdraft
  • 5.4Streaming responsesdraft
  • 5.5Common tasks — summarize, classify, extract, translatedraft
  • 5.6Multilingual capabilitiesdraft
  • 5.7Embeddings with Gemini Embedding 2draft

Seven essays on understanding and generating images with the Gemini image family.

  • 6.1Image understanding + `media_resolution`draft
  • 6.2Native image generation — the Nano Banana familydraft
  • 6.3Text-to-image & image editingdraft
  • 6.4Multi-turn editing & up to 14 reference imagesdraft
  • 6.5Resolutions, aspect ratios, search-grounded imagesdraft
  • 6.6Imagen 4 — when to use vs Nano Bananadraft
  • 6.7SynthID watermarking, safety & policiesdraft
Ch 7

Video — Veo 3.1

0/3 published

Three essays on video understanding and Veo 3.1 generation with synchronized audio.

  • 7.1Video understanding (frame analysis, `media_resolution`)draft
  • 7.2Generation with Veo 3.1 / 3.1 Lite (synchronized audio)draft
  • 7.3Creative controls & editing; cost / latency / use casesdraft

Six essays on audio understanding, TTS, real-time voice, music, and end-to-end voice agents.

  • 8.1Audio understandingdraft
  • 8.2TTS with 3.1 Flash TTS — steerable, expressivedraft
  • 8.3Real-time voice with 3.1 Flash Live (WebSocket, PCM)draft
  • 8.4Partner stacks — LiveKit, Pipecat, Agoradraft
  • 8.5Music with Lyria 3draft
  • 8.6Build a voice agent end-to-enddraft

Five essays on Omni — the model that takes anything in and outputs video with native audio.

  • 9.1The any-to-any modeldraft
  • 9.2Prompting Omni — 10-second clips, physics, motiondraft
  • 9.3Conversational editing of generated mediadraft
  • 9.4Omni vs Veo vs Nano Banana — when to use whichdraft
  • 9.5SynthID, safety, availability, pricingdraft
Ch 10

Documents & Multimodal RAG

0/5 published

Five essays on document understanding and retrieval-augmented patterns.

  • 10.1PDF / document understandingdraft
  • 10.2Multimodal embeddings & unified searchin progress
  • 10.3RAG system design with Geminiin progress
  • 10.4File Search tool for agentsin progress
  • 10.5Build a document Q&A systemdraft
Ch 11

Robotics & Embodied AI

0/5 published

Five essays on Robotics-ER 1.6 and the Gemini Robotics SDK.

  • 11.1Robotics-ER 1.6 — embodied reasoning overviewdraft
  • 11.2Spatial reasoning, pointing, instrument readingdraft
  • 11.3Multi-step physical task planning; tool / VLA callsdraft
  • 11.4The Gemini Robotics SDK & agent frameworkdraft
  • 11.5Integration patterns — trusted-tester notesdraft
Part III

Agentic Systems

Function calling, built-in tools, Computer Use, Deep Research, harnesses, multi-agent, and the inter-agent ecosystem.

Eight essays on declaring tools, the call/respond loop, signatures, parallelism, and production patterns.

  • 12.1Function declarations (OpenAPI subset)draft
  • 12.2The 4-step flow; `from_callable()` auto-generationdraft
  • 12.3Function call IDs — pair calls and responsesdraft
  • 12.4Multimodal function responsesdraft
  • 12.5Streaming function callingdraft
  • 12.6Thought-signature circulation in function callingdraft
  • 12.7Parallel & compositional callingdraft
  • 12.8Error handling, retries; build a real appdraft

Six essays on the tools you don't have to implement — Google ships them inside the model call.

  • 13.1Google Search groundingdraft
  • 13.2Google Maps groundingdraft
  • 13.3Code Executiondraft
  • 13.4URL Contextdraft
  • 13.5File Search over corporadraft
  • 13.6Combining built-in tools with function callingdraft

Six essays on the model that operates a real browser: screenshot, decide, click, observe.

  • 14.1How Computer Use works — screenshot, action, loopdraft
  • 14.2Supported actions; normalized coordinates (0-999)draft
  • 14.3Safety system — allowed / confirm / blockeddraft
  • 14.4Sandboxed execution; Playwright integrationdraft
  • 14.5Custom functions alongside Computer Use; parallel actionsdraft
  • 14.6Build a web-scraping & a form-automation agentdraft

Seven essays on the Deep Research and Deep Research Max agents — long-running, plan-then-execute research.

  • 15.1Deep Research vs Deep Research Maxdraft
  • 15.2The Interactions APIdraft
  • 15.3Collaborative planning — plan, review, approve, executedraft
  • 15.4Tools — Search, URL, Code, MCP, File Searchdraft
  • 15.5Streaming event types; reconnectiondraft
  • 15.6Visualization — agent-generated chartsdraft
  • 15.7Use cases — market analysis, due diligence, lit reviewdraft

Seven essays on the harness layer — the loop around the model that makes agents real.

  • 16.1What an agent harness isdraft
  • 16.2The Antigravity SDKdraft
  • 16.3Defining custom agent behaviors; hostingdraft
  • 16.4Managed Agents in the Gemini APIdraft
  • 16.5Long-running agents via the Interactions APIdraft
  • 16.6The Antigravity platform — desktop / CLI / SDKdraft
  • 16.7Build a custom-harness agent end-to-enddraft

Seven essays on coordinating many agents — ADK, Agent Engine, memory, MCP.

  • 17.1ADK overview — Agent Development Kitdraft
  • 17.2Multi-agent orchestration & delegationdraft
  • 17.3MCP integration — tools as MCP serversdraft
  • 17.4Agent Engine runtime — deploy to Cloud Run / GKEdraft
  • 17.5Sessions & Memory Bank (GA) — short and long-term memorydraft
  • 17.6Framework interop — LangGraph, LlamaIndex, CrewAI, Vercel AI SDKdraft
  • 17.7Build a production multi-agent systemdraft

Six essays on the inter-agent web — Agent Registry, A2A, AP2, Cloud API Registry.

  • 18.1Agent Registry — identity, governance, discoverydraft
  • 18.2Agent2Agent (A2A) — cross-vendor interopdraft
  • 18.3Securing the fleet — permissions, signed agent cardsdraft
  • 18.4Cloud API Registry — Google Cloud services as MCP toolsdraft
  • 18.5Agent Payments (AP2) — agent-led transactionsdraft
  • 18.6Designing for an open agent ecosystemdraft
Part IV

Production

Prototype → production, evals, safety, performance & cost, and migration.

Five essays on the path from a working prototype to a system you can actually operate.

  • 19.1AI Studio → Vertex: auth, keys, service accountsdraft
  • 19.2Batch API — large-scale asyncdraft
  • 19.3Flex inference vs Priority inferencedraft
  • 19.4Context caching at scaledraft
  • 19.5Endpoints, monitoring, quotas, rate limitsdraft
Ch 20

Evaluating Agentic Systems

0/7 published

Seven essays on moving from vibe-checks to data-driven evaluation of agents.

  • 20.1Why evals — from vibe-checks to data-drivendraft
  • 20.2Vertex GenAI evaluation service overviewdraft
  • 20.3Final-response metricsdraft
  • 20.4Trajectory evaluation — the six metricsdraft
  • 20.5Building evaluation datasetsdraft
  • 20.6Evaluating across frameworks (ADK, LangGraph, CrewAI)draft
  • 20.7Evals in CI; regression gates for agentsdraft
Ch 21

Safety & Responsible AI

0/5 published

Five essays on the safety primitives, filter configuration, and responsible practices.

  • 21.1Safety principles, protections, filter configurationdraft
  • 21.2Why content was blocked; response strategiesdraft
  • 21.3SynthID for generated mediadraft
  • 21.4Agent security & AP2 considerationsdraft
  • 21.5Responsible-AI practicesdraft

Five essays on tuning Gemini-shaped workloads for cost and latency at scale.

  • 22.1Thinking-level tuning for cost / latencydraft
  • 22.2`media_resolution` optimizationdraft
  • 22.3Token budgeting & countingdraft
  • 22.4Caching strategies; batch vs realtimedraft
  • 22.5Model selection for costdraft
Ch 23

Migration

0/4 published

Four essays on moving across models, vendors, and time — the migration playbooks.

  • 23.1Gemini 2.x → 3.x → 3.5 — the migrationin progress
  • 23.2OpenAI → Gemini migrationin progress
  • 23.3Anthropic → Gemini migrationin progress
  • 23.4Common pitfalls & fixesdraft