Projects & Impact

A showcase of professional work and personal projects exploring platform engineering, MLOps, and cloud-native technologies.

Featured Case Studies

In-depth looks at platform challenges, approaches, and outcomes.

Data PlatformDatenna B.V. • 2025 - Present

Multi-Region Azure Databricks Platform

Context

Data-intensive analytics company processing intelligence data across multiple regions, requiring a modern data platform to handle 50TB+ daily data with strict governance requirements.

Problem

The existing data infrastructure couldn't scale to meet growing data volumes. Manual cluster management led to inefficient resource utilization, and lack of standardized pipelines created inconsistency across data teams. Cost visibility was poor, making optimization difficult.

Approach

Designed a multi-region Databricks platform with Infrastructure as Code at its core, implementing medallion architecture for data organization and self-service capabilities for data teams.

Infrastructure: Pulumi + Crossplane for declarative infrastructure, enabling GitOps workflows and environment consistency

Data Architecture: Medallion architecture (Bronze/Silver/Gold) with Delta Lake for ACID transactions and time travel

Cost Control: Auto-scaling clusters with spot instances, job clusters for batch workloads, interactive clusters with auto-termination

CI/CD: Azure DevOps + ArgoCD for infrastructure, Databricks Asset Bundles for notebook/job deployments

Cost Reduction

35% (~€420K/year)

Processing Speed

3x faster

Deployment Automation

100%

Time to New Environment

< 30 minutes

AzureDatabricksPulumiCrossplaneDelta LakeAzure DevOpsArgoCDPython

Core PlatformPVH Europe • 2022 - 2024

Kubernetes Platform Modernization

Context

Global fashion retailer (Calvin Klein, Tommy Hilfiger) running 200+ microservices on legacy VM infrastructure, facing scaling challenges and slow deployment velocity.

Problem

Weekly deployments were the norm, with each taking 2+ hours. Teams waited days for infrastructure provisioning. Observability was fragmented across tools, making incident response slow. The VM-based infrastructure couldn't efficiently handle traffic spikes.

Approach

Built a Kubernetes-native platform with self-service capabilities, comprehensive observability, and GitOps-driven deployments. Migrated services incrementally with feature flags to minimize risk.

Platform: AWS EKS with managed node groups, Karpenter for intelligent autoscaling based on workload requirements

Deployment: ArgoCD for GitOps, progressive rollouts with Argo Rollouts, standardized Helm charts as golden paths

Observability: Prometheus + Grafana for metrics, OpenSearch for logs, distributed tracing with custom dashboards per team

Migration Strategy: Strangler fig pattern - new services on K8s, gradual migration of existing services with parallel running

Annual Savings

€500K+

Deployment Time

2 hours → 15 min (85% faster)

Deployment Frequency

Weekly → 50+/day

Platform Uptime

99.99%

AWSEKSTerraformArgoCDPrometheusGrafanaOpenSearchHelmKarpenter

Data PlatformPVH Europe • 2023 - 2024

Enterprise OpenSearch Observability Platform

Context

Large-scale observability requirements for 200+ services generating terabytes of logs daily, with existing Elasticsearch clusters becoming increasingly expensive and difficult to manage.

Problem

Elasticsearch licensing costs were escalating rapidly. Cluster management was manual and error-prone. Index lifecycle management was inconsistent, leading to storage bloat. Teams lacked self-service capabilities for creating dashboards and alerts.

Approach

Migrated to OpenSearch with a focus on operational efficiency, implementing automated index lifecycle management and self-service patterns for development teams.

Platform: Self-managed OpenSearch on Kubernetes with dedicated node pools for hot/warm/cold tiers

Data Management: Automated ILM policies, index templates with optimized mappings, snapshot lifecycle management to S3

Self-Service: Terraform modules for teams to provision their own index patterns, dashboards as code with version control

Performance: Query optimization, shard sizing based on data patterns, caching strategies for common queries

License Cost Savings

60%+

Query Performance

40% faster

Storage Efficiency

50% reduction

Time to Dashboard

Days → Hours

OpenSearchKubernetesTerraformFluent BitS3PrometheusPython

Side Projects

Personal projects exploring MLOps, AI infrastructure, and advanced Kubernetes patterns.

In ProgressFeatured

Kortex

Kubernetes-native AI inference gateway for multi-model routing, A/B testing, and intelligent failover. Features circuit breakers with exponential backoff, OpenTelemetry tracing, smart routing (cost/latency/context-length), and configuration hot-reload. CNCF Sandbox candidate.

Progress

CRDsRoutingA/B TestingFallbacksMetricsRate LimitingCost TrackingOpenTelemetrySmart RoutingCircuit BreakersE2E TestsHelm Chart

Goals

•Multi-model routing with header/path/model-based rules
•A/B testing with consistent hashing for experiment assignment
•Circuit breakers with exponential backoff and jitter

GoKubebuilderKubernetesKServe+2 more

GitHubDemo Soon

CompletedFeatured

AI Infrastructure FinOps Platform

Production-ready cost optimization platform for AI/ML workloads. Features GPU utilization monitoring, budget forecasting with alerts, ML-based anomaly detection, automated right-sizing recommendations, and multi-cloud billing integration. All 3 phases complete.

Progress

MVPBudget & AlertsML AnalyticsChargeback ReportsAWS Billing APIRight-Sizing Engine

Goals

•Real-time GPU utilization monitoring with idle resource alerts
•Cost-per-inference tracking with team/project attribution
•Budget forecasting with trend analysis and alert system

NVIDIA DCGMOpenCostPrometheusGrafana+3 more

GitHubDemo Soon

CompletedFeatured

MLOps Platform on Kubernetes

Production-ready multi-cloud MLOps platform on AWS EKS, Azure AKS, and GCP GKE with defense-in-depth security and full-stack observability. Enables data science teams to deploy ML models, HuggingFace transformers, and LLMs from experimentation to production in 15 minutes with full auditability, drift detection, and GitOps-driven infrastructure.

Progress

CoreMLflowKServeCI/CDGPUSecurityDrift DetectionAzure SupportGCP SupportLLM InferenceHuggingFaceGitOpsObservabilityBackup & DRChaos TestingDocs

Goals

•Self-service model deployment reducing time from 2-3 days to 15 minutes
•Multi-cloud deployment: AWS EKS with Karpenter, Azure AKS with KEDA, GCP GKE with Node Auto-provisioning
•LLM inference with vLLM (Mistral, CodeLlama, Llama-2, Mixtral) and OpenAI-compatible API

Argo WorkflowsMLflow 3.xKServevLLM+16 more

GitHubDemo Soon

PlannedFeatured

SpotTensor

GPU compute price aggregator — "Trivago for ML training". Arbitrages spot pricing across AWS, RunPod, and Lambda Labs to find the cheapest GPU instances for batch training jobs.

Progress

AWS Spot ConnectorRunPod IntegrationLambda Labs IntegrationGPU Normalization SchemaPrice Comparison CLIRecommendation Engine

Goals

•Unified pricing API across cloud GPU providers (AWS Spot, RunPod, Lambda Labs)
•Normalization of inconsistent GPU naming conventions to standard schema
•Cost-optimal provider recommendation based on GPU type, duration, and availability

GoAWS Price List APIREST APIsCLI+1 more

GitHub

PlannedFeatured

AgentFile

Docker Compose for AI Agents — Declarative spec that deploys AI agent stacks to Kubernetes. GitOps-native with Kortex integration for inference governance. Transparent abstraction: generates readable K8s manifests you own.

Progress

Schema ParserCLI ScaffoldRAG StackManifest GeneratorKind SupportKortex Integration

Goals

•Simple YAML spec (agentfile.yaml) that abstracts complex K8s deployment
•Pre-built stacks for common patterns: RAG, Vision, Code, Multi-Agent
•GitOps-native: generates ArgoCD/Flux-compatible manifests

GoKubernetesKustomizeHelm+3 more

GitHub

Projects & Impact

A showcase of professional work and personal projects exploring platform engineering, MLOps, and cloud-native technologies.

Featured Case Studies

In-depth looks at platform challenges, approaches, and outcomes.

Data PlatformDatenna B.V. • 2025 - Present

Multi-Region Azure Databricks Platform

Context

Data-intensive analytics company processing intelligence data across multiple regions, requiring a modern data platform to handle 50TB+ daily data with strict governance requirements.

Problem

Approach

Designed a multi-region Databricks platform with Infrastructure as Code at its core, implementing medallion architecture for data organization and self-service capabilities for data teams.

Infrastructure: Pulumi + Crossplane for declarative infrastructure, enabling GitOps workflows and environment consistency

Data Architecture: Medallion architecture (Bronze/Silver/Gold) with Delta Lake for ACID transactions and time travel

Cost Control: Auto-scaling clusters with spot instances, job clusters for batch workloads, interactive clusters with auto-termination

CI/CD: Azure DevOps + ArgoCD for infrastructure, Databricks Asset Bundles for notebook/job deployments

Cost Reduction

35% (~€420K/year)

Processing Speed

3x faster

Deployment Automation

100%

Time to New Environment

< 30 minutes

AzureDatabricksPulumiCrossplaneDelta LakeAzure DevOpsArgoCDPython

Core PlatformPVH Europe • 2022 - 2024

Kubernetes Platform Modernization

Context

Global fashion retailer (Calvin Klein, Tommy Hilfiger) running 200+ microservices on legacy VM infrastructure, facing scaling challenges and slow deployment velocity.

Problem

Approach

Built a Kubernetes-native platform with self-service capabilities, comprehensive observability, and GitOps-driven deployments. Migrated services incrementally with feature flags to minimize risk.

Platform: AWS EKS with managed node groups, Karpenter for intelligent autoscaling based on workload requirements

Deployment: ArgoCD for GitOps, progressive rollouts with Argo Rollouts, standardized Helm charts as golden paths

Observability: Prometheus + Grafana for metrics, OpenSearch for logs, distributed tracing with custom dashboards per team

Migration Strategy: Strangler fig pattern - new services on K8s, gradual migration of existing services with parallel running

Annual Savings

€500K+

Deployment Time

2 hours → 15 min (85% faster)

Deployment Frequency

Weekly → 50+/day

Platform Uptime

99.99%

AWSEKSTerraformArgoCDPrometheusGrafanaOpenSearchHelmKarpenter

Data PlatformPVH Europe • 2023 - 2024

Enterprise OpenSearch Observability Platform

Context

Large-scale observability requirements for 200+ services generating terabytes of logs daily, with existing Elasticsearch clusters becoming increasingly expensive and difficult to manage.

Problem

Approach

Migrated to OpenSearch with a focus on operational efficiency, implementing automated index lifecycle management and self-service patterns for development teams.

Platform: Self-managed OpenSearch on Kubernetes with dedicated node pools for hot/warm/cold tiers

Data Management: Automated ILM policies, index templates with optimized mappings, snapshot lifecycle management to S3

Self-Service: Terraform modules for teams to provision their own index patterns, dashboards as code with version control

Performance: Query optimization, shard sizing based on data patterns, caching strategies for common queries

License Cost Savings

60%+

Query Performance

40% faster

Storage Efficiency

50% reduction

Time to Dashboard

Days → Hours

OpenSearchKubernetesTerraformFluent BitS3PrometheusPython

Side Projects

Personal projects exploring MLOps, AI infrastructure, and advanced Kubernetes patterns.

In ProgressFeatured

Kortex

Progress

CRDsRoutingA/B TestingFallbacksMetricsRate LimitingCost TrackingOpenTelemetrySmart RoutingCircuit BreakersE2E TestsHelm Chart

Goals

•Multi-model routing with header/path/model-based rules
•A/B testing with consistent hashing for experiment assignment
•Circuit breakers with exponential backoff and jitter

GoKubebuilderKubernetesKServe+2 more

GitHubDemo Soon

CompletedFeatured

AI Infrastructure FinOps Platform

Progress

MVPBudget & AlertsML AnalyticsChargeback ReportsAWS Billing APIRight-Sizing Engine

Goals

•Real-time GPU utilization monitoring with idle resource alerts
•Cost-per-inference tracking with team/project attribution
•Budget forecasting with trend analysis and alert system

NVIDIA DCGMOpenCostPrometheusGrafana+3 more

GitHubDemo Soon

CompletedFeatured

MLOps Platform on Kubernetes

Progress

CoreMLflowKServeCI/CDGPUSecurityDrift DetectionAzure SupportGCP SupportLLM InferenceHuggingFaceGitOpsObservabilityBackup & DRChaos TestingDocs

Goals

•Self-service model deployment reducing time from 2-3 days to 15 minutes
•Multi-cloud deployment: AWS EKS with Karpenter, Azure AKS with KEDA, GCP GKE with Node Auto-provisioning
•LLM inference with vLLM (Mistral, CodeLlama, Llama-2, Mixtral) and OpenAI-compatible API

Argo WorkflowsMLflow 3.xKServevLLM+16 more

GitHubDemo Soon

PlannedFeatured

SpotTensor

GPU compute price aggregator — "Trivago for ML training". Arbitrages spot pricing across AWS, RunPod, and Lambda Labs to find the cheapest GPU instances for batch training jobs.

Progress

AWS Spot ConnectorRunPod IntegrationLambda Labs IntegrationGPU Normalization SchemaPrice Comparison CLIRecommendation Engine

Goals

•Unified pricing API across cloud GPU providers (AWS Spot, RunPod, Lambda Labs)
•Normalization of inconsistent GPU naming conventions to standard schema
•Cost-optimal provider recommendation based on GPU type, duration, and availability

GoAWS Price List APIREST APIsCLI+1 more

GitHub

PlannedFeatured

AgentFile

Progress

Schema ParserCLI ScaffoldRAG StackManifest GeneratorKind SupportKortex Integration

Goals

•Simple YAML spec (agentfile.yaml) that abstracts complex K8s deployment
•Pre-built stacks for common patterns: RAG, Vision, Code, Multi-Agent
•GitOps-native: generates ArgoCD/Flux-compatible manifests

GoKubernetesKustomizeHelm+3 more

GitHub