Case 1 — PCB inspection, 250 good samples, no anomalies
The brief: inspect populated PCBs at end-of-line for solder bridges, missing components, and tombstones. Customer had 250 known-good boards and effectively zero confirmed defects in the historical archive (defective boards were destroyed before any imaging step existed).What we tried first: PatchCore on the 250 good boards. AUROC on synthetically perturbed test set was 96 %. We told the customer we needed real defects to validate.
What actually worked: we ran the model in shadow mode for three weeks. The operator manually flagged 47 boards as defective during that period. Of those, our model had scored 41 in the top 5 % of anomaly scores. The 6 it missed were all subtle solder issues on the underside — a camera placement problem, not a model problem.
Lesson: PatchCore on a tiny good-only dataset is shockingly competent if the camera coverage is right. The bottleneck wasn't the model. It was the optical setup.
Case 2 — Textured leather inspection, supervised approach failed
The brief: inspect leather upholstery rolls for cuts, scars, insect bites, and dye inconsistencies. Customer had 8 000 labelled defective images and 30 000 good images.What we tried first: supervised classifier (EfficientNet-B3 with focal loss). 97 % accuracy on the holdout. Deployed. In week 2, the model started missing a defect type that hadn't been in the training set: a particular cluster pattern from a new supplier's tanning process.
What worked: switched to DRAEM. Trained on good samples only, with synthetic perturbations. Slight drop in headline accuracy on known defect types (95 % vs 97 %), but the model started catching the new defect type within a week of the supplier change.
Lesson: supervised models memorise the defect types they saw. Unsupervised models flag anomalies they've never seen. For materials with high natural variance, unsupervised is the safer bet, even at lower headline accuracy.
Case 3 — Drift caught by the embedding monitor, not the score
The brief: surface defect inspection on coated metal parts. Model in production for 10 months, stable. Reject rate stable around 0.4 %.The drift event: a software update to the camera driver changed the gain auto-adjustment behaviour. Mean exposure shifted up by ~8 %. The anomaly score distribution shifted with it. Reject rate stayed stable for three days because the threshold was self-adjusting on the score quantile — but the model was now flagging surface highlights as defects.
How we caught it: embedding-distribution monitor (KL divergence vs baseline) crossed its alert threshold two days before the operator override rate started climbing. Rolled back the camera driver, re-baselined the embedding monitor.
Lesson: reject rate alone doesn't tell you the system is healthy. Watch the embedding distribution. Watch operator overrides. The lag between something breaking and reject rate noticing is days, not minutes.
Common thread
None of these were model-architecture problems. They were systems problems — camera coverage, training data composition, monitoring blind spots. Anomaly detection projects fail at the system level far more often than at the model level.Anyone willing to share a deployment that taught them something? Curious about non-image cases too — vibration, acoustic, current.