ML Systems
Fraud / Risk Detection Pipeline
Score transactions or actions in real time for fraud risk with a feedback loop from human review.
Scale to anchor on
Hundreds of millions of events per day, p99 scoring < 100 ms inline, human review queue of tens of thousands per day.
Requirements
Functional
- Score transactions / listings / users in real time.
- Apply rule-based and ML-based scoring.
- Send borderline cases to human review.
- Feed reviewed labels back into training.
Non-functional
- Low false-positive rate (cost = lost legitimate revenue).
- Auditable decisions for regulatory and dispute review.
- Resilient to adversarial drift.
High-level architecture
Online feature store provides real-time features; a model serves inline predictions. Rules engine adds operator-defined logic. Borderline scores route to a review queue. Reviewer labels feed a labeling pipeline; periodic retraining updates the model.
Components
Online feature store
Real-time per-entity features (velocity, history, network signals).
Model server
Inline scoring with strict latency budgets.
Rules engine
Operator-authored rules for known patterns and hard policy.
Review queue
Human reviewers handle borderline cases; labels return to training.
Labeling pipeline
Combines reviewer labels with delayed ground truth (chargebacks).
Key decisions
Hybrid rules + ML.
Pure ML misses sharp policy lines; pure rules miss subtle patterns. Hybrid is the operational reality.
Audit every decision.
Disputes, regulatory inquiries, and post-incident review all need it.
Distinguish false-positive cost from false-negative cost.
These differ by orders of magnitude in payments; the operating point must reflect that.
Delayed ground truth handling.
Real fraud labels arrive weeks later (chargebacks); training must respect that delay to avoid leakage.
Pitfalls
- Training on biased reviewer labels without counterfactual handling.
- No rule-based override channel for incident response.
- Operating point unchanged as adversaries adapt.
- Ignoring feature freshness — model decays silently.
Follow-up questions
- How do you handle a new fraud pattern that the model has never seen?
- How do you balance reviewer load against latency to decision?
- What's the feedback loop between review and the model?