Back to portfolio

Birat's Notebook

Deep dives into AI Agents, MLOps, and the systems behind intelligence.

Hot vs Cold Memory: State Architecture Patterns for Long-Running Agents

Fri Apr 17 2026 • Birat Gautam

Long-running agent quality depends on memory architecture, not just context window size. Separate hot execution state from cold historical memory to scale safely.


Eval-Driven Releases: How to Ship Agent Changes Without Guessing

Fri Apr 17 2026 • Birat Gautam

Agent quality is a release engineering problem. A stable eval suite with quality gates is the only reliable way to ship model, prompt, and tool changes safely.


From Prompts to Policy Engines: Guardrails That Survive Real Traffic

Fri Apr 17 2026 • Birat Gautam

Prompt-only guardrails fail under scale. Durable safety comes from explicit policy engines that evaluate intent, context, and tool permissions before execution.


RAG Reliability by Design: Retrieval Quality SLOs That Prevent Silent Failure

Fri Apr 17 2026 • Birat Gautam

Most RAG failures start before generation. Define retrieval SLOs, measure them continuously, and gate responses when evidence quality is weak.


When Agents Should Not Decide: Building Confidence Thresholds for Human Handoff

Thu Apr 16 2026 • Birat Gautam

Agents need rejection regions and escalation policies. The right goal is not maximum autonomy, but appropriate autonomy with clear human handoff points.


Observability for Black-Box Agents: Tracing Decisions in Production

Thu Apr 16 2026 • Birat Gautam

Agent observability is about reconstructing decisions, not just timing requests. You need traces that show what the agent saw, believed, and decided.


The Hallucination Budget: Quantifying Risk for Mission-Critical Agents

Thu Apr 16 2026 • Birat Gautam

Hallucinations are not random. They cluster by input type, failure mode, and downstream cost, which means they can be budgeted like any other production risk.


Agents in the Loop: Designing for Human-AI Collaboration Instead of Replacement

Thu Apr 16 2026 • Birat Gautam

The best agents do not replace people. They reduce human effort on routine work, surface confidence clearly, and make intervention cheap when the case is borderline.


The Latency Trap: Why 99th-Percentile Response Time Matters More Than Average

Thu Apr 16 2026 • Birat Gautam

Agent latency is heavy-tailed, not normal. The user experience is governed by tail latency, stage budgets, and the failure paths that inflate p95 and p99.


Orchestrating Agents at Scale: When You Need a Supervisor, Not a Bigger Model

Thu Apr 16 2026 • Birat Gautam

Coordination complexity does not disappear when you use a bigger model. A supervisor plus specialized agents usually scales better than one monolithic agent.


Prompt Injection in Agents: Defense Patterns That Actually Work

Thu Apr 16 2026 • Birat Gautam

Prompt injection is not a prompt-writing bug. It is an architecture problem across retrieval, memory, tools, and output handling.


State Management Without the Mess: Deterministic Agent Memory for Long-Running Systems

Thu Apr 16 2026 • Birat Gautam

Vector search is useful, but deterministic event logs are what make long-running agents auditable, reproducible, and safe to debug after the fact.


Token Economics: Why Your Agent Architecture Is Costing 10x More Than It Should

Thu Apr 16 2026 • Birat Gautam

Token spend is usually an architecture problem, not a prompt-writing problem. The biggest savings come from routing, caching, pruning, and fewer unnecessary model calls.


The Tool-Use Illusion: Why Most Agent Frameworks Fail at Production Scale

Thu Apr 16 2026 • Birat Gautam

Adding more tools does not make an agent smarter if every decision adds latency, retries, and hidden orchestration cost. Here is how to design tool flows that stay fast and debuggable.


The Architecture of Agency: Model Context Protocol (MCP)

Wed Apr 15 2026 • Birat Gautam

MCP turns tool integration from custom glue code into a protocol. This guide explains the architecture, the trade-offs, and how to build a server that is actually useful in production.


Demystifying the Working of ReactJs: From JSX to Pixels

Fri Aug 30 2024 • Birat Gautam

A practical walkthrough of what actually happens from JSX authoring to browser rendering, including Babel transforms, Vite build stages, and how React finally updates pixels on screen.