Birat's Notebook

Birat's Notebook https://birat.codes/blog Deep dives into AI Agents, MLOps, and the systems behind intelligence. en-US Sun, 19 Apr 2026 09:33:31 GMT I Tested Gemma 4 for Local Agentic AI: Architecture, Benchmarks, Prompting, and Deployment Lessons https://birat.codes/blog/gemma-4-local-agentic-ai-deployment-lessons https://birat.codes/blog/gemma-4-local-agentic-ai-deployment-lessons Sat, 18 Apr 2026 00:00:00 GMT From Android on-device flows to workstation-grade MoE serving, this hands-on Gemma 4 deep dive explains where it shines, where it breaks, and how to deploy it without latency, memory, and tool-loop traps. Gemma 4 Local AI Agents LLM Deployment MoE Prompt Engineering Hot vs Cold Memory: State Architecture Patterns for Long-Running Agents https://birat.codes/blog/agent-memory-architecture-cold-hot-state https://birat.codes/blog/agent-memory-architecture-cold-hot-state Fri, 17 Apr 2026 00:00:00 GMT Long-running agent quality depends on memory architecture, not just context window size. Separate hot execution state from cold historical memory to scale safely. Agentic AI Memory Architecture State Management Scalability Context Management Is Actually Workflow Design https://birat.codes/blog/context-management-is-workflow-design https://birat.codes/blog/context-management-is-workflow-design Fri, 17 Apr 2026 00:00:00 GMT The 1M token window didn't just give us more room—it exposed a hidden layer of AI development nobody was talking about. How you manage context reveals how well you understand your own work. Eval-Driven Releases: How to Ship Agent Changes Without Guessing https://birat.codes/blog/eval-driven-agent-release-process https://birat.codes/blog/eval-driven-agent-release-process Fri, 17 Apr 2026 00:00:00 GMT Agent quality is a release engineering problem. A stable eval suite with quality gates is the only reliable way to ship model, prompt, and tool changes safely. Agentic AI Evals CI/CD Quality Engineering Production Systems From Prompts to Policy Engines: Guardrails That Survive Real Traffic https://birat.codes/blog/guardrails-policy-engine-for-agent-tools https://birat.codes/blog/guardrails-policy-engine-for-agent-tools Fri, 17 Apr 2026 00:00:00 GMT Prompt-only guardrails fail under scale. Durable safety comes from explicit policy engines that evaluate intent, context, and tool permissions before execution. Agentic AI Security Guardrails Policy Engine Tooling RAG Reliability by Design: Retrieval Quality SLOs That Prevent Silent Failure https://birat.codes/blog/rag-retrieval-quality-slos https://birat.codes/blog/rag-retrieval-quality-slos Fri, 17 Apr 2026 00:00:00 GMT Most RAG failures start before generation. Define retrieval SLOs, measure them continuously, and gate responses when evidence quality is weak. RAG Search Quality Production Systems SLO Observability When Agents Should Not Decide: Building Confidence Thresholds for Human Handoff https://birat.codes/blog/agent-confidence-thresholds https://birat.codes/blog/agent-confidence-thresholds Thu, 16 Apr 2026 00:00:00 GMT Agents need rejection regions and escalation policies. The right goal is not maximum autonomy, but appropriate autonomy with clear human handoff points. Agentic AI Human-AI Collaboration Risk Management Decision Policy Autonomy Observability for Black-Box Agents: Tracing Decisions in Production https://birat.codes/blog/agent-observability https://birat.codes/blog/agent-observability Thu, 16 Apr 2026 00:00:00 GMT Agent observability is about reconstructing decisions, not just timing requests. You need traces that show what the agent saw, believed, and decided. Agentic AI Observability Production Systems Debugging Tracing The Hallucination Budget: Quantifying Risk for Mission-Critical Agents https://birat.codes/blog/hallucination-budget https://birat.codes/blog/hallucination-budget Thu, 16 Apr 2026 00:00:00 GMT Hallucinations are not random. They cluster by input type, failure mode, and downstream cost, which means they can be budgeted like any other production risk. Agentic AI Risk Management Production Systems Safety Evaluation Agents in the Loop: Designing for Human-AI Collaboration Instead of Replacement https://birat.codes/blog/human-ai-collaboration https://birat.codes/blog/human-ai-collaboration Thu, 16 Apr 2026 00:00:00 GMT The best agents do not replace people. They reduce human effort on routine work, surface confidence clearly, and make intervention cheap when the case is borderline. Agentic AI Human-AI Collaboration UX Design Workflow Design Decision Support The Latency Trap: Why 99th-Percentile Response Time Matters More Than Average https://birat.codes/blog/latency-percentiles https://birat.codes/blog/latency-percentiles Thu, 16 Apr 2026 00:00:00 GMT Agent latency is heavy-tailed, not normal. The user experience is governed by tail latency, stage budgets, and the failure paths that inflate p95 and p99. Agentic AI Performance Production Systems Observability SRE Orchestrating Agents at Scale: When You Need a Supervisor, Not a Bigger Model https://birat.codes/blog/orchestrating-agents-scale https://birat.codes/blog/orchestrating-agents-scale Thu, 16 Apr 2026 00:00:00 GMT Coordination complexity does not disappear when you use a bigger model. A supervisor plus specialized agents usually scales better than one monolithic agent. Agentic AI Multi-Agent Systems Architecture Orchestration Workflows Prompt Injection in Agents: Defense Patterns That Actually Work https://birat.codes/blog/prompt-injection-defense https://birat.codes/blog/prompt-injection-defense Thu, 16 Apr 2026 00:00:00 GMT Prompt injection is not a prompt-writing bug. It is an architecture problem across retrieval, memory, tools, and output handling. Agentic AI Security Production Systems Prompt Injection Input Validation State Management Without the Mess: Deterministic Agent Memory for Long-Running Systems https://birat.codes/blog/state-management-agent-memory https://birat.codes/blog/state-management-agent-memory Thu, 16 Apr 2026 00:00:00 GMT Vector search is useful, but deterministic event logs are what make long-running agents auditable, reproducible, and safe to debug after the fact. Agentic AI State Management Production Systems Event Sourcing Compliance Token Economics: Why Your Agent Architecture Is Costing 10x More Than It Should https://birat.codes/blog/token-economics-agent-architecture https://birat.codes/blog/token-economics-agent-architecture Thu, 16 Apr 2026 00:00:00 GMT Token spend is usually an architecture problem, not a prompt-writing problem. The biggest savings come from routing, caching, pruning, and fewer unnecessary model calls. Agentic AI Cost Optimization Architecture Caching Model Routing The Tool-Use Illusion: Why Most Agent Frameworks Fail at Production Scale https://birat.codes/blog/tool-use-illusion https://birat.codes/blog/tool-use-illusion Thu, 16 Apr 2026 00:00:00 GMT Adding more tools does not make an agent smarter if every decision adds latency, retries, and hidden orchestration cost. Here is how to design tool flows that stay fast and debuggable. Agentic AI Production Systems Architecture Latency Tooling The Architecture of Agency: Model Context Protocol (MCP) https://birat.codes/blog/model-context-protocol-mcp-intro https://birat.codes/blog/model-context-protocol-mcp-intro Wed, 15 Apr 2026 00:00:00 GMT MCP turns tool integration from custom glue code into a protocol. This guide explains the architecture, the trade-offs, and how to build a server that is actually useful in production. Agentic AI MCP AI Systems Developer Experience Tooling Demystifying the Working of ReactJs: From JSX to Pixels https://birat.codes/blog/demystifying-working-react-from-jsx-pixels https://birat.codes/blog/demystifying-working-react-from-jsx-pixels Fri, 30 Aug 2024 00:00:00 GMT A practical walkthrough of what actually happens from JSX authoring to browser rendering, including Babel transforms, Vite build stages, and how React finally updates pixels on screen. React JSX Babel Vite Frontend Build Tools Rendering