Why Anomalib is our default
Intel's Anomalib has, over the last two years, become the de-facto framework for unsupervised anomaly detection. We default to it on new projects because:- Most published architectures (PatchCore, EfficientAD, FastFlow, PaDiM, ReverseDistillation) are implemented and validated.
- The PyTorch Lightning base means our training loops, callbacks, and logging are not custom code.
- OpenVINO export is a single flag — meaningful when the deployment target is Intel CPU.
- The dataset abstractions handle MVTec-style folder structures out of the box.
Where it falls short (in our experience)
Anomalib is built around the academic benchmark workflow. Production looks different.Custom dataset shapes. If your data isn't a clean train/good + test/good + test/anomaly folder split, you're writing a custom Datamodule. We have a small library of these for: rolling production captures, annotated rejection logs, and noisy "good" sets that require curation passes.
Online learning / incremental updates. Anomalib expects you to retrain from scratch when the dataset changes. For a memory-bank model, this is mostly fine — bank rebuild is fast. For distillation models, you're either retraining nightly or accepting drift.
Drift detection. Not in scope for the framework. We bolt on a separate service that watches the feature distribution from the embedded extractor and alerts when KL divergence from the calibration baseline crosses a threshold.
Operator-facing inference results. Anomaly maps are post-processed for visualization in research. In production, the operator wants overlay PNGs sized for their HMI, with the anomaly score in the EXIF metadata. That's user code.
The shape of code we end up writing on top
- Dataset adapter that pulls from our line-image database (PostgreSQL + S3) instead of disk
- Threshold service: maintains threshold per camera, per shift, with online recalibration from a small operator feedback loop
- Drift watcher: runs on a 15-minute cadence, hashes recent embeddings, compares to baseline
- Result publisher: writes anomaly score + heatmap to MQTT for the PLC and to a relational table for the audit trail
- Retrain pipeline: triggers on either a scheduled cadence or a drift alert
Alternatives we evaluated and didn't pick
- MVTec's commercial framework — strong but locked into their stack. Fine if the customer already runs MVTec everywhere.
- Custom PyTorch — the temptation is real but the maintenance cost is brutal once you have more than two cells deployed.
- ADBench — useful for benchmarking on tabular anomaly data, not relevant for image work.
One opinion
The thing that determines whether an anomaly framework "works" is not the model implementations — it's whether the dataset abstraction matches your data lifecycle. We've burned more time on data-loading code than on model code, by an order of magnitude.What are you using? Curious about anyone running a custom stack at scale.