Serious Data From Testing LLMs
A data-driven analysis of LLM performance on a simple retrieval task, highlighting the need for evidence-based AI testing.
A data-driven analysis of LLM performance on a simple retrieval task, highlighting the need for evidence-based AI testing.
Explains how .NET 10 allows running standalone C# script files directly, similar to Python, without needing a full project.
Overview of key proposals in Apache Iceberg v4, focusing on performance, metadata efficiency, and portability for modern data workloads.
Explains how to create user-defined functions in ColdFusion to coalesce values, addressing limitations of built-in operators.
Benchmark results comparing the performance of Python 3.14 against older Python versions, PyPy, Node.js, and Rust on pure Python code.
Ben Nadel argues for using self-closing ColdFusion custom tags to make code structure explicit and avoid confusion in nested tags.
Explores key new features in C# 14 and .NET 10, including field-backed properties and partial events/constructors.
Explains why and how to use CancellationTokens in .NET APIs to stop long-running operations and free server resources.
Explores the unique challenges of testing Generative AI and Large Language Models, contrasting it with traditional software testing approaches.
Introducing Claudiomiro, a Node.js CLI tool that enables Claude AI to autonomously complete complex coding tasks through a structured, looping workflow.
A guide to the four main methods for evaluating Large Language Models, including code examples and practical implementation details.
Explores four main methods for evaluating Large Language Models (LLMs), including code examples for implementing each approach from scratch.
A developer explains how to configure a ColdFusion JDBC connection string to use UTC time, fixing a 5-hour time discrepancy with MySQL.
A developer compares Claude Sonnet 4.5, GPT-5 Codex, and Grok Code Fast 1 for coding tasks in Cursor, testing feature generation and test creation.
Azure Local Arc Gateway is now GA, providing a centralized HTTPS egress point for Azure Local workloads, simplifying security and reducing endpoints.
Analysis of changes in Claude Code 2.0's system prompt, showing less prescriptive guidance and more trust in the Sonnet 4.5 model.
Weekly roundup of recent Azure, .NET, GitHub, and Visual Studio blog posts, including updates on AI tools, security, and development features.
Announcement of a beta release for GExperts 1.3.26, a toolset for the Delphi 13 64-bit IDE, including installation steps and known limitations.
Explains how to use Azure Policy to automatically enable Virtual Network Flow Logs across many VNets for security and troubleshooting.
A technical article describing a solution for sorting hierarchical data fields using string interning in C, focusing on maintaining original order while grouping nested structures.