A Better Way to Explain Modern Architecture Using the C4 Model
Explains the C4 model for creating clear, multi-level software architecture diagrams to improve communication across teams.
Rajesh P writes about building scalable, secure, and high-performance backend systems. His articles cover Spring Boot, API design and versioning, system design fundamentals, and modern GenAI concepts like rerankers, LLM limits, and latency optimization.
18 articles from this blog
Explains the C4 model for creating clear, multi-level software architecture diagrams to improve communication across teams.
Explains grounding in LLMs: connecting them to reliable data for accurate, context-aware responses using techniques like RAG and fine-tuning.
Explains Docker's layer caching mechanism and how to bypass it using --no-cache for reliable builds.
Explores idempotency in system design, its patterns, and critical role in reliable distributed systems, generative AI, and AI agents.
Learn how to build a GitHub repository summarizer using ClaudeAgent Skills, including architecture, setup, and script integration.
Learn how to build a prompt evaluation system using Spring AI and Claude, covering datasets, graders, and workflows.
Building a prompt evaluation system with Spring AI and Claude to measure and improve prompt quality through automated testing and reporting.
A complete walkthrough on building a Claude Agent Skill using Java, covering setup, prerequisites, and project structure.
Explains how to use Spring Boot's @MatrixVariable annotation for embedding key-value parameters in URL path segments, with practical examples.
Explains API versioning concepts and details the new first-class versioning support introduced in Spring Boot 4 (Spring Framework 7).
A practical guide to implementing essential API security best practices in Spring Boot, including HTTPS, JWT authentication, authorization, and rate limiting.
A technical deep dive into how AI rerankers work, explaining their scoring mechanisms, model architectures, and implementation trade-offs.
Explains how rerankers improve search and AI results by reordering retrieved documents for better precision and relevance.
Explains LLM API token limits (TPM) and strategies for managing concurrent requests to avoid rate limiting in production applications.
Explains why P95 and P99 latency metrics are crucial for understanding real user experience, not just average response times.
Explains Little's Law from queuing theory and how it applies to system performance, showing why latency increases cause concurrency to balloon under load.
Part 2 of a guide on using Docker Compose to enhance the reliability and portability of AI agents, focusing on Dockerfile and compose.yaml.
A tutorial on using Docker Compose to create reproducible, containerized runtime environments for AI agents, focusing on a weather query example.