Loading...
Loading...
Building AI infrastructure tools in the open. These projects aim to solve real problems in the ML/AI platform space and are designed for potential CNCF contribution.
Kubernetes-native AI inference gateway for multi-model routing, A/B testing, and intelligent failover. Features circuit breakers with exponential backoff, OpenTelemetry tracing, smart routing (cost/latency/context-length), and configuration hot-reload. CNCF Sandbox candidate.
Production-ready cost optimization platform for AI/ML workloads. Features GPU utilization monitoring, budget forecasting with alerts, ML-based anomaly detection, automated right-sizing recommendations, and multi-cloud billing integration. All 3 phases complete.
Production-ready multi-cloud MLOps platform on AWS EKS, Azure AKS, and GCP GKE with defense-in-depth security and full-stack observability. Enables data science teams to deploy ML models, HuggingFace transformers, and LLMs from experimentation to production in 15 minutes with full auditability, drift detection, and GitOps-driven infrastructure.
GPU compute price aggregator — "Trivago for ML training". Arbitrages spot pricing across AWS, RunPod, and Lambda Labs to find the cheapest GPU instances for batch training jobs.
Docker Compose for AI Agents — Declarative spec that deploys AI agent stacks to Kubernetes. GitOps-native with Kortex integration for inference governance. Transparent abstraction: generates readable K8s manifests you own.
Currently focused on shipping Kortex, AI FinOps Platform, and MLOps Platform to production-grade quality.
Upstream contributions to CNCF projects (KServe, Karpenter, OpenCost) are planned as these projects mature and generate issues worth contributing back.
Complete MLOps platform showing how all the pieces fit together.
Check out my GitHub profile for more projects and contributions.