Blog

Thoughts on platform engineering, MLOps, and building reliable systems at scale.

Latest Posts

Platform DesignDec 18, 20248 min read

Building Kortex: A Kubernetes-Native AI Inference Gateway

A technical deep dive into designing Kortex, a multi-provider inference gateway with intelligent routing, circuit breakers, OpenTelemetry tracing, and cost tracking.

GoKubernetesKubebuilderKServe

Data PlatformsMar 15, 202512 min read

Migrating from ELK to AWS Managed OpenSearch: What I Learned at PVH

An opinionated take on migrating a TB-scale observability platform from the ELK stack to AWS Managed OpenSearch for 200+ microservices at PVH Europe, and when managed beats self-managed.

OpenSearchAWSTerraformFluent Bit

AI InfrastructureApr 22, 202510 min read

Cost Attribution for AI Agents: Why Tool-Call-Level FinOps Is the Missing Layer

A thought leadership piece on why AI cost attribution needs to happen at the protocol level, not as an afterthought. References MCP, Kagenti, and the emerging need for agent-aware FinOps.

FinOpsOpenCostPrometheusMCP

Platform DesignMay 10, 202515 min read

Platform Infrastructure for LLM-Powered Products: Lessons from Building the Matching API at myTomorrows

An opinionated take on building production infrastructure for LLM-powered healthcare products — dedicated clusters for medical data, CloudFront VPC Origins over API Gateway, and KEDA over HPA for async LLM workloads.

AWS EKSTerraformKEDACloudFront

Platform Design

Architecture decisions, trade-offs, and patterns for AI-ready platforms

MLOps in Practice

Real-world ML platform implementation, not theoretical ideals

SRE for Data/ML

Reliability engineering for data pipelines and ML inference

Blog

Latest Posts

Building Kortex: A Kubernetes-Native AI Inference Gateway

Migrating from ELK to AWS Managed OpenSearch: What I Learned at PVH

Cost Attribution for AI Agents: Why Tool-Call-Level FinOps Is the Missing Layer

Platform Infrastructure for LLM-Powered Products: Lessons from Building the Matching API at myTomorrows

Content Themes

Platform Design

MLOps in Practice

SRE for Data/ML

Blog

Latest Posts

Building Kortex: A Kubernetes-Native AI Inference Gateway

Migrating from ELK to AWS Managed OpenSearch: What I Learned at PVH

Cost Attribution for AI Agents: Why Tool-Call-Level FinOps Is the Missing Layer

Platform Infrastructure for LLM-Powered Products: Lessons from Building the Matching API at myTomorrows

Content Themes

Platform Design

MLOps in Practice

SRE for Data/ML