Back to Portfolio

ML Resilience Lab

Credit Card Fraud Detection System & Resilience Platform

About

ML Resilience Lab is a high-availability Credit Card fraud detection system designed to identify suspicious activities in real-world financial data. Built to handle extreme class imbalances, the system implements a complete MLOps lifecycle—from high-velocity data ingestion to automated drift detection and self-healing protocols. By leveraging anonymized transaction data and advanced calibration techniques, it ensures financial security remains robust even under adverse system conditions or evolving fraud patterns.

Technologies Used

AirflowMongoDBXGBoostFastAPIEvidently AIMLflowLangSmithStreamlitDocker Compose

Real-Time Data Foundation

A robust multi-stage ETL architecture built on top of a custom Python producer that simulates high-velocity Kafka-like streaming. This foundation handles data ingestion with strict state persistence, ensuring zero data loss during simulated stress scenarios. The system features three distinct traffic modes—Normal, Demo, and Stress—to validate infrastructure stability under varying load conditions.

Multi-Layer Feature Pipeline

Implementing a clean Data Lakehouse pattern with Bronze (raw), Silver (validated), and Gold (ML-ready) layers. The pipeline uses Pydantic for strict data contract enforcement and automated validation. It handles complex feature engineering including median imputation for missing values and real-time enrichment via simulated external credit score APIs, ensuring high-quality inputs for the inference engine.

Model Serving & Uncertainty

Production-grade inference server powered by FastAPI and XGBoost. Beyond simple classification, the system implements probability calibration to estimate prediction uncertainty. Decisions are automatically stratified into three tiers: Auto-Approve for high-confidence safe transactions, Human-Required for ambiguous cases, and Express Review for suspicious but non-obvious patterns, optimizing the operational workflow.

Resilience & Self-Healing

The core innovation of the lab: a dedicated resilience layer that monitors system health in real-time. Using Evidently AI for drift detection and Prometheus for system metrics, it implements a three-state circuit breaker pattern (Closed, Open, Half-Open). This allows the system to automatically detect performance degradation or data drift and trigger self-healing protocols, ensuring the fraud detection remains reliable even under adverse conditions.