The book

Programming Gemini (Agentic Edition)

A working developer's guide to building production agentic systems on the Google Gemini platform. Twenty-five chapters from the Welcome through the final Migration playbook — read in order or jump straight to what you need.

143 essays · 2 published · 4 parts · 25 chapters

Part I

Foundations

Orientation, the platform map, the SDK, thinking, and prompting for thinking models.

Welcome

1/2 published

Why this book, who it's for, and how it's organized. The orientation read before Chapter 1.

1Welcome to Programming Geminidraft
2How to use this book

Understanding Models

1/5 published

Five essays on what models actually are: tokens, context, autoregressive generation, multimodality, and the new dimension — reasoning. The conceptual on-ramp before the platform map.

1What a language model actually is
2Multimodality — one model, many input typesdraft
3Reasoning and thinking — the new dimensiondraft
4What the dev workflow looks like nowdraft
5How Gemini compares - the mental model, not the benchmark tabledraft

Ch 1

The Gemini Platform in 2026

0/6 published

Six essays that orient you to the 2026 platform: from model to four-layer stack, the family tree, model selection, pricing/context/tiers, the stability lifecycle, and choosing a surface.

1.1From a model to a platform: what changedin progress
1.2The model family treedraft
1.3Model-selection framework: which model for which jobin progress
1.4Pricing, context windows, knowledge cutoffs, tiersdraft
1.5Preview vs Stable vs Experimental: naming and deprecationin progress
1.6Choosing your surface: AI Studio vs Vertex AI vs Antigravitydraft

Ch 2

Getting Started with the Gen AI SDK

0/6 published

Install, authenticate, make your first call, switch surfaces, drop in across languages.

2.1Install & environmentdraft
2.2Your first request & the response objectdraft
2.3Switching AI Studio ↔ Vertex AIdraft
2.4OpenAI-compatibility layer (drop-in)draft
2.5Languages — Python, Node, Go, Java, RESTdraft
2.6Build-an-app-from-a-prompt — AI Studio + Antigravity quickstartdraft

Ch 3

Thinking — the Core of Gemini

0/6 published

Six essays on the reasoning layer that shapes every Gemini call in 2026.

3.1Dynamic reasoning by default — the mental modeldraft
3.2`thinking_level` — minimal, low, medium, highdraft
3.3Thought signatures and strict validationdraft
3.4Cost & latency of thinkingdraft
3.5Temperature 1.0 — and what breaks otherwisedraft
3.6Legacy `thinking_budget` and the migration pathdraft

Ch 4

Prompting & Context Engineering

0/7 published

Seven essays on getting the right context to the model, cheaply, with thinking-model-aware techniques.

4.1Prompting for thinking models vs traditionaldraft
4.2System instructions for agentic workflowsdraft
4.3Long-context strategies (1M window)draft
4.4Context caching for costdraft
4.5`media_resolution` for images, video, docsdraft
4.6File input patterns — inline, File API, URIsdraft
4.7Token counting & cost estimationdraft

Part II

Modalities

Text + embeddings, images, video, audio, Omni, documents + RAG, and robotics.

Ch 5

Text, Structured Output & Embeddings

0/7 published

Seven essays on getting useful text out — plus JSON, streaming, common tasks, multilingual, and embeddings.

5.1Text generation — Flash vs Prodraft
5.2Multi-turn chat & conversation statedraft
5.3Structured output — JSON mode & response schemasdraft
5.4Streaming responsesdraft
5.5Common tasks — summarize, classify, extract, translatedraft
5.6Multilingual capabilitiesdraft
5.7Embeddings with Gemini Embedding 2draft

Ch 6

Images — Nano Banana & Imagen

0/7 published

Seven essays on understanding and generating images with the Gemini image family.

6.1Image understanding + `media_resolution`draft
6.2Native image generation — the Nano Banana familydraft
6.3Text-to-image & image editingdraft
6.4Multi-turn editing & up to 14 reference imagesdraft
6.5Resolutions, aspect ratios, search-grounded imagesdraft
6.6Imagen 4 — when to use vs Nano Bananadraft
6.7SynthID watermarking, safety & policiesdraft

Ch 7

Video — Veo 3.1

0/3 published

Three essays on video understanding and Veo 3.1 generation with synchronized audio.

7.1Video understanding (frame analysis, `media_resolution`)draft
7.2Generation with Veo 3.1 / 3.1 Lite (synchronized audio)draft
7.3Creative controls & editing; cost / latency / use casesdraft

Ch 8

Audio — Live, TTS & Lyria

0/6 published

Six essays on audio understanding, TTS, real-time voice, music, and end-to-end voice agents.

8.1Audio understandingdraft
8.2TTS with 3.1 Flash TTS — steerable, expressivedraft
8.3Real-time voice with 3.1 Flash Live (WebSocket, PCM)draft
8.4Partner stacks — LiveKit, Pipecat, Agoradraft
8.5Music with Lyria 3draft
8.6Build a voice agent end-to-enddraft

Ch 9

Gemini Omni — Any-to-Any Generation

0/5 published

Five essays on Omni — the model that takes anything in and outputs video with native audio.

9.1The any-to-any modeldraft
9.2Prompting Omni — 10-second clips, physics, motiondraft
9.3Conversational editing of generated mediadraft
9.4Omni vs Veo vs Nano Banana — when to use whichdraft
9.5SynthID, safety, availability, pricingdraft

Ch 10

Documents & Multimodal RAG

0/5 published

Five essays on document understanding and retrieval-augmented patterns.

10.1PDF / document understandingdraft
10.2Multimodal embeddings & unified searchdraft
10.3RAG system design with Geminidraft
10.4File Search tool for agentsdraft
10.5Build a document Q&A systemdraft

Ch 11

Robotics & Embodied AI

0/5 published

Five essays on Robotics-ER 1.6 and the Gemini Robotics SDK.

11.1Robotics-ER 1.6 — embodied reasoning overviewdraft
11.2Spatial reasoning, pointing, instrument readingdraft
11.3Multi-step physical task planning; tool / VLA callsdraft
11.4The Gemini Robotics SDK & agent frameworkdraft
11.5Integration patterns — trusted-tester notesdraft

Part III

Agentic Systems

Function calling, built-in tools, Computer Use, Deep Research, harnesses, multi-agent, and the inter-agent ecosystem.

Ch 12

Function Calling — the Agent's Hands

0/8 published

Eight essays on declaring tools, the call/respond loop, signatures, parallelism, and production patterns.

12.1Function declarations (OpenAPI subset)draft
12.2The 4-step flow; `from_callable()` auto-generationdraft
12.3Function call IDs — pair calls and responsesdraft
12.4Multimodal function responsesdraft
12.5Streaming function callingdraft
12.6Thought-signature circulation in function callingdraft
12.7Parallel & compositional callingdraft
12.8Error handling, retries; build a real appdraft

Ch 13

Built-in Tools — Search, Maps, Code, URLs, Files

0/6 published

Six essays on the tools you don't have to implement — Google ships them inside the model call.

13.1Google Search groundingdraft
13.2Google Maps groundingdraft
13.3Code Executiondraft
13.4URL Contextdraft
13.5File Search over corporadraft
13.6Combining built-in tools with function callingdraft

Ch 14

Computer Use — Browser Automation Agents

0/6 published

Six essays on the model that operates a real browser: screenshot, decide, click, observe.

14.1How Computer Use works — screenshot, action, loopdraft
14.2Supported actions; normalized coordinates (0-999)draft
14.3Safety system — allowed / confirm / blockeddraft
14.4Sandboxed execution; Playwright integrationdraft
14.5Custom functions alongside Computer Use; parallel actionsdraft
14.6Build a web-scraping & a form-automation agentdraft

Ch 15

Deep Research Agent — Autonomous Research

0/7 published

Seven essays on the Deep Research and Deep Research Max agents — long-running, plan-then-execute research.

15.1Deep Research vs Deep Research Maxdraft
15.2The Interactions APIdraft
15.3Collaborative planning — plan, review, approve, executedraft
15.4Tools — Search, URL, Code, MCP, File Searchdraft
15.5Streaming event types; reconnectiondraft
15.6Visualization — agent-generated chartsdraft
15.7Use cases — market analysis, due diligence, lit reviewdraft

Ch 16

Agent Harnesses — Antigravity SDK & Managed Agents

0/7 published

Seven essays on the harness layer — the loop around the model that makes agents real.

16.1What an agent harness isdraft
16.2The Antigravity SDKdraft
16.3Defining custom agent behaviors; hostingdraft
16.4Managed Agents in the Gemini APIdraft
16.5Long-running agents via the Interactions APIdraft
16.6The Antigravity platform — desktop / CLI / SDKdraft
16.7Build a custom-harness agent end-to-enddraft

Ch 17

Multi-Agent Systems with ADK & Agent Engine

0/7 published

Seven essays on coordinating many agents — ADK, Agent Engine, memory, MCP.

17.1ADK overview — Agent Development Kitdraft
17.2Multi-agent orchestration & delegationdraft
17.3MCP integration — tools as MCP serversdraft
17.4Agent Engine runtime — deploy to Cloud Run / GKEdraft
17.5Sessions & Memory Bank (GA) — short and long-term memorydraft
17.6Framework interop — LangGraph, LlamaIndex, CrewAI, Vercel AI SDKdraft
17.7Build a production multi-agent systemdraft

Ch 18

The Agent Ecosystem — Registry, A2A & Payments

0/6 published

Six essays on the inter-agent web — Agent Registry, A2A, AP2, Cloud API Registry.

18.1Agent Registry — identity, governance, discoverydraft
18.2Agent2Agent (A2A) — cross-vendor interopdraft
18.3Securing the fleet — permissions, signed agent cardsdraft
18.4Cloud API Registry — Google Cloud services as MCP toolsdraft
18.5Agent Payments (AP2) — agent-led transactionsdraft
18.6Designing for an open agent ecosystemdraft

Part IV

Production

Prototype → production, evals, safety, performance & cost, and migration.

Ch 19

Prototype → Production on Vertex AI / Gemini Enterprise

0/5 published

Five essays on the path from a working prototype to a system you can actually operate.

19.1AI Studio → Vertex: auth, keys, service accountsdraft
19.2Batch API — large-scale asyncdraft
19.3Flex inference vs Priority inferencedraft
19.4Context caching at scaledraft
19.5Endpoints, monitoring, quotas, rate limitsdraft

Ch 20

Evaluating Agentic Systems

0/7 published

Seven essays on moving from vibe-checks to data-driven evaluation of agents.

20.1Why evals — from vibe-checks to data-drivendraft
20.2Vertex GenAI evaluation service overviewdraft
20.3Final-response metricsdraft
20.4Trajectory evaluation — the six metricsdraft
20.5Building evaluation datasetsdraft
20.6Evaluating across frameworks (ADK, LangGraph, CrewAI)draft
20.7Evals in CI; regression gates for agentsdraft

Ch 21

Safety & Responsible AI

0/5 published

Five essays on the safety primitives, filter configuration, and responsible practices.

21.1Safety principles, protections, filter configurationdraft
21.2Why content was blocked; response strategiesdraft
21.3SynthID for generated mediadraft
21.4Agent security & AP2 considerationsdraft
21.5Responsible-AI practicesdraft

Ch 22

Performance & Cost Optimization

0/5 published

Five essays on tuning Gemini-shaped workloads for cost and latency at scale.

22.1Thinking-level tuning for cost / latencydraft
22.2`media_resolution` optimizationdraft
22.3Token budgeting & countingdraft
22.4Caching strategies; batch vs realtimedraft
22.5Model selection for costdraft

Ch 23

Migration

0/4 published

Four essays on moving across models, vendors, and time — the migration playbooks.

23.1Gemini 2.x → 3.x → 3.5 — the migrationin progress
23.2OpenAI → Gemini migrationin progress
23.3Anthropic → Gemini migrationin progress
23.4Common pitfalls & fixesdraft