Gemini Interactions API Quick Start
A quick start guide for Google's Gemini Interactions API, covering setup, stateful conversations, and multimodal interactions.
Philipp Schmid is a Staff Engineer at Google DeepMind, building AI Developer Experience and DevRel initiatives. He specializes in LLMs, RLHF, and making advanced AI accessible to developers worldwide.
183 articles from this blog
A quick start guide for Google's Gemini Interactions API, covering setup, stateful conversations, and multimodal interactions.
Explains why MCP servers often fail and provides best practices for building effective MCP servers by treating them as AI agent interfaces, not REST API wrappers.
A technical guide on generating transparent PNG stickers using the Gemini API with chromakey green and HSV color detection for clean background removal.
A guide to building AI agents using the Gemini Interactions API, covering core concepts and a step-by-step CLI implementation.
Introducing mcp-cli, a lightweight CLI tool for efficient, dynamic discovery and interaction with MCP servers, drastically reducing token usage for AI agents.
Explains the concept of an Agent Harness, a system for managing reliable, long-running AI agents, and its growing importance in AI development.
A software engineer's predictions for AI trends in 2026, covering generative UI, edge-based agents, smart homes, and the evolving role of engineers.
Explores advanced Context Engineering techniques for AI agents, focusing on combating Context Rot and improving multi-agent coordination.
Senior engineers struggle with AI agent development due to ingrained deterministic habits, contrasting with the probabilistic nature of agent engineering.
A step-by-step tutorial on building a functional AI agent using the Gemini 3 Pro model and Python, covering core concepts like tools, loops, and context.
Best practices and structural patterns for effectively prompting the Gemini 3 AI model, focusing on directness, logic, and clear instruction.
A tutorial on using the Gemini API's File Search feature for RAG in web development with JavaScript/TypeScript.
A tutorial on building an AI agent using Google's Gemini, n8n workflow automation, and deploying it on Google Cloud Run with a PostgreSQL database.
A comprehensive overview of over 50 modern AI agent benchmarks, categorized into function calling, reasoning, coding, and computer interaction tasks.
Explores the evolution from simple, stateless AI agents (Agent 1.0) to advanced, deep agents (Agent 2.0) capable of complex, multi-step tasks.
Explains the concept of AI subagents, specialized agents for specific tasks, and their architecture using an orchestrator model.
A 10-step guide for e-commerce teams to generate consistent product images using Google's Gemini 2.5 Flash AI model for text-to-image and editing tasks.
Explores the concept of memory in AI agents, detailing short-term and long-term memory architectures to overcome LLM statelessness.
A quick reference guide for installing, configuring, and using the Google Gemini CLI, an AI-powered terminal tool for coding and task management.
Introducing Code Sandbox MCP, a Model Context Protocol server for safely executing Python and JavaScript code in containers via AI agents.