MacWhisper has Automatic Speaker Recognition now
MacWhisper's new Automatic Speaker Recognition feature, powered by NVIDIA Parakeet, accurately identifies speakers in audio transcripts.
SimonWillison.net is the long-running blog of Simon Willison, a software engineer, open-source creator, and co-author of the original Django framework. He writes about Python, Django, Datasette, AI tooling, prompt engineering, search, databases, APIs, data journalism, and practical software architecture. The blog includes detailed notes from experiments, conference talks, and real projects. Readers will find clear explanations of topics such as LLM workflows, SQL patterns, data publishing, scraping, deployment, caching, and modern developer tooling. Simon also publishes frequent micro-posts and TIL entries that document small discoveries and tricks from day-to-day engineering work. The tone is practical and research oriented, making the site a valuable resource for anyone interested in serious engineering and open data.
260 articles from this blog
MacWhisper's new Automatic Speaker Recognition feature, powered by NVIDIA Parakeet, accurately identifies speakers in audio transcripts.
Google Antigravity is a new AI-powered IDE that integrates with Gemini models for agentic coding, featuring browser testing and automated documentation.
Ethan Mollick reflects on AI's rapid evolution from chatbots to digital coworkers, highlighting the changing role of human oversight.
A hands-on review of Google's new Gemini 3 Pro AI model, covering its features, benchmarks, pricing, and testing its multimodal capabilities.
Discusses the future of small open source libraries in the age of LLMs, questioning their relevance when AI can generate specific code.
Explores Andrej Karpathy's concept of Software 2.0, where AI writes programs through objectives and gradient descent, focusing on task verifiability.
Release of llm-anthropic plugin 0.22 with support for Claude's structured outputs and web search tool integration.
A guide to parakeet-mlx, a project porting NVIDIA's Parakeet ASR model to Apple's MLX framework for fast, local audio transcription.
Analysis of GPT-5.1's new adaptive thinking features, model routing system, and safety benchmarks from the system card addendum.
OpenAI releases GPT-5.1 API with new reasoning modes, adaptive reasoning, extended prompt caching, and new built-in tools for developers.
Datasette 1.0a22 release notes covering new security features, a client detection method, and developer tools for plugin authors.
A deep dive into Google's Nano Banana (Gemini 2.5 Flash) AI image model, exploring its autoregressive architecture and superior prompt engineering capabilities.
OpenAI objects to court order demanding 20M ChatGPT user conversations, citing dangerous precedent for AI discovery.
A humorous look at AI model benchmarking using the challenge of generating an SVG of a pelican riding a bicycle, and the risks of labs 'gaming' the test.
Explains how MCP servers enable faster development by using LLMs to dynamically read specs, unlike traditional APIs.
A clever hack using POSIX advisory locks for cross-container communication between processes on the same machine.
An analysis of scaling HNSW vector indexing in Redis, covering new contributions for efficient deletions and parallel queries across distributed nodes.
Experiment testing if AI vision models improve SVG drawings of a pelican on a bicycle through iterative, agentic feedback loops.
Using AI coding agents to automate repetitive plugin upgrades for Datasette 1.0, running six parallel sessions.
Netflix's guidelines for using generative AI in content production, focusing on copyright, data security, and talent rights.