LATEST ESSAY — MAY 2026

Notes on serving the systems we don't yet understand.

Long-form essays on the infrastructure, evaluation, and governance of modern AI — written for engineers who have to ship them and for the people who have to trust the result.

Read the latest essay

Senior Manager, AI & Data Security at Hyland · Writing on serving infrastructure, agentic evaluation, and the gap between deployment and governance.

Featured

Why You Can't Serve LLMs Like Regular Models

Read the full essay

MAY 2026 • 25 MIN READ Evergreen AI INFRASTRUCTURE

A practical walk through the five fundamental differences between traditional ML inference and modern LLM serving — continuous batching, prefill/decode disaggregation, PagedAttention, prefix-aware routing, and Mixture-of-Experts sharding. Built around hand-drawn diagrams and interactive simulators.

Varun Varia

Hand-drawn · Continuous batching

Start here

Read the latest essay — Why You Can't Serve LLMs Like Regular Models
Browse short-form notes — seedlings, buddings, evergreens
About the writer — background, current focus, contact