Distributed systems articles

8/15/2025 • EN

Durable Execution for AI workflows

Explores using durable execution engines like Azure Durable Task Scheduler to build robust, long-running AI workflows, such as summarizing articles and generating newsletters.

AI Workflows Azure distributed systems Durable Execution Workflow Orchestration

Geert Baeke

8/13/2025 • EN

Knowledge and Common Knowledge in a Distributed Environment, Part 1

Explores the concepts of knowledge and common knowledge in distributed systems, starting with the classic muddy children puzzle.

Common Knowledge distributed systems Epistemology Raft Tla

A. Jesse Jiryu Davis

8/13/2025 • EN

Quick tips for distributed event-based systems

Explains how to replace brittle, synchronous side-effects in endpoints with a resilient, event-based system using queues for better error handling and performance.

asynchronous programming distributed systems error handling Event Driven Architecture Message Queues

Swizec Teller

8/9/2025 • EN

What even is distributed systems

An introduction to distributed systems, covering core challenges and recommended learning resources like the book 'Designing Data-Intensive Applications' and the MIT course.

Consistency Designing Data Intensive Applications distributed systems Mit Distributed Systems reliability

Phil Eaton

7/1/2025 • EN

From Python Programmer to Distributed Systems Researcher in 10 Years Without a PhD

A personal journey from aspiring dancer to Python programmer and eventually a distributed systems researcher, detailing career transitions and technical growth.

c computer science distributed systems Programming Career software development

A. Jesse Jiryu Davis

6/13/2025 • EN

When Doing Too Much Is a Symptom, Not a Solution

A software engineer reflects on the human challenges of tech work, including burnout, team attrition, and the pressure to refactor legacy systems.

Aspnet Core distributed systems F Grpc software engineering

Kevin Avignon

6/5/2025 • EN

Can We Rely On Timers For Distributed Algorithms?

Explores the reliability of timers in distributed algorithms like Raft, arguing they are viable with safety margins for mechanisms like leader leases.

Consensus distributed systems Leader Election Raft Timers

A. Jesse Jiryu Davis

5/10/2025 • EN

Are We Serious About Using TLA+ For Statistical Properties?

Explores extending TLA+ for performance modeling using queueing theory and simulation, moving beyond just correctness verification.

distributed systems Formal Methods Performance Modeling Queueing Theory Tla

A. Jesse Jiryu Davis

5/8/2025 • EN

Jesse's 2025 TLA+ Community Event Notes

Summary of talks from the 2025 TLA+ Community Event, focusing on formal methods and model-guided fuzzing for distributed systems.

distributed systems Formal Methods Fuzzing Model Checking Tla

A. Jesse Jiryu Davis

4/20/2025 • EN

Transactions are a protocol

Explores transactions as a protocol that can be added to any storage system, not an intrinsic feature, with examples from Delta Lake, Epoxy, and Two-Phase Commit.

Consistency distributed systems Protocol Storage Systems Transactions

Phil Eaton

3/18/2025 • EN

The Synchrony Budget

Explains the concept of a 'synchrony budget' for designing distributed systems, advocating for asynchronous communication to improve performance and availability.

Asynchronous Communication distributed systems Microservices Synchronous Calls system design

Gunnar Morling

3/18/2025 • EN

The Synchrony Budget

Explains the concept of a 'synchrony budget' for distributed systems, advocating for minimizing synchronous calls to improve performance and availability.

Asynchronous Communication Availability distributed systems Microservices Synchronous Calls

Gunnar Morling

3/5/2025 • EN

Let's Take a Look at... KIP-932: Queues for Kafka!

Explores KIP-932, a proposal to add queue semantics and share groups to Apache Kafka for improved message processing.

Apache Kafka distributed systems Kip Queues streaming

Gunnar Morling

3/5/2025 • EN

Let's Take a Look at... KIP-932: Queues for Kafka!

Explores KIP-932, a proposal to add queue semantics and share groups to Apache Kafka for improved message processing.

Apache Kafka distributed systems Kip Queues streaming

Gunnar Morling

2/6/2025 • EN

Review: SwiftPaxos: Fast Geo-Replicated State Machines

A review of SwiftPaxos, a new Paxos variant designed for fast, geo-replicated state machines in high-latency networks.

Consensus Algorithms distributed systems Geo Replication Paxo State Machines

A. Jesse Jiryu Davis

12/20/2024 • EN

On Versioning Observabilities (1.0, 2.0, 3.0…10.0?!?)

A critique of semantic versioning in observability marketing, arguing that terms like 'Observability 2.0' describe a real technical shift despite overuse.

distributed systems Monitoring observability software development versioning

Charity Majors