LLM APIs are a Synchronization Problem
Analyzes LLM APIs as a distributed state synchronization problem, critiquing their abstraction and proposing a mental model based on token and cache state.
Analyzes LLM APIs as a distributed state synchronization problem, critiquing their abstraction and proposing a mental model based on token and cache state.
Explores building a basic Durable Execution engine using SQLite and Java to reliably persist and resume multi-step workflows, like those in agentic systems.
Explores building a basic Durable Execution engine in Java using SQLite to persist workflow state and resume from failures.
An introduction to the distributed actor model for building concurrent, resilient systems, explaining its core concepts and benefits.
A curated collection of articles on software architecture, development practices, Java updates, and testing strategies for tech professionals.
Explains service discovery in .NET 8, covering the built-in NuGet package, configuration, and integration with HttpClient.
An engineer argues that software development is a learning process, not an assembly line, and explains how to use LLMs as brainstorming partners.
Explores the compounding impact of shaving milliseconds off microservice latency in distributed systems, affecting throughput and scalability.
Explains the Store and Forward resiliency design pattern for handling service dependencies in tech systems like payments and telecom.
Explains how avoiding cross-region calls in microservices improves performance and resilience, and discusses the complexities of designing for regional isolation.
Explores the 'coordinated attack' problem in distributed systems, linking it to the impossibility of achieving common knowledge in asynchronous environments.
Explores using durable execution engines like Azure Durable Task Scheduler to build robust, long-running AI workflows, such as summarizing articles and generating newsletters.
Explores the concepts of knowledge and common knowledge in distributed systems, starting with the classic muddy children puzzle.
Explains how to replace brittle, synchronous side-effects in endpoints with a resilient, event-based system using queues for better error handling and performance.
An introduction to distributed systems, covering core challenges and recommended learning resources like the book 'Designing Data-Intensive Applications' and the MIT course.
A personal journey from aspiring dancer to Python programmer and eventually a distributed systems researcher, detailing career transitions and technical growth.
A software engineer reflects on the human challenges of tech work, including burnout, team attrition, and the pressure to refactor legacy systems.
Explores the reliability of timers in distributed algorithms like Raft, arguing they are viable with safety margins for mechanisms like leader leases.
Explores extending TLA+ for performance modeling using queueing theory and simulation, moving beyond just correctness verification.
Summary of talks from the 2025 TLA+ Community Event, focusing on formal methods and model-guided fuzzing for distributed systems.