İçeriğe geç
KAMPANYA

Logo Tasarım + Web Tasarım + 1 Yıl Domain + E-posta + Hosting — $299 +KDV

AIOR

Anomalib in production: what works, what we end up rewriting

Sektör topluluğu — sorularınız, deneyimleriniz ve duyurularınız için.

Anomalib in production: what works, what we end up rewriting

Aior

Administrator
Staff member
Joined
Apr 2, 2023
Messages
175
Reaction score
2
Points
18
Age
40
Location
Turkey
Website
aior.com
1/3
Thread owner
500


Why Anomalib is our default​

Intel's Anomalib has, over the last two years, become the de-facto framework for unsupervised anomaly detection. We default to it on new projects because:
  • Most published architectures (PatchCore, EfficientAD, FastFlow, PaDiM, ReverseDistillation) are implemented and validated.
  • The PyTorch Lightning base means our training loops, callbacks, and logging are not custom code.
  • OpenVINO export is a single flag — meaningful when the deployment target is Intel CPU.
  • The dataset abstractions handle MVTec-style folder structures out of the box.

Where it falls short (in our experience)​

Anomalib is built around the academic benchmark workflow. Production looks different.

Custom dataset shapes. If your data isn't a clean train/good + test/good + test/anomaly folder split, you're writing a custom Datamodule. We have a small library of these for: rolling production captures, annotated rejection logs, and noisy "good" sets that require curation passes.

Online learning / incremental updates. Anomalib expects you to retrain from scratch when the dataset changes. For a memory-bank model, this is mostly fine — bank rebuild is fast. For distillation models, you're either retraining nightly or accepting drift.

Drift detection. Not in scope for the framework. We bolt on a separate service that watches the feature distribution from the embedded extractor and alerts when KL divergence from the calibration baseline crosses a threshold.

Operator-facing inference results. Anomaly maps are post-processed for visualization in research. In production, the operator wants overlay PNGs sized for their HMI, with the anomaly score in the EXIF metadata. That's user code.

The shape of code we end up writing on top​

  • Dataset adapter that pulls from our line-image database (PostgreSQL + S3) instead of disk
  • Threshold service: maintains threshold per camera, per shift, with online recalibration from a small operator feedback loop
  • Drift watcher: runs on a 15-minute cadence, hashes recent embeddings, compares to baseline
  • Result publisher: writes anomaly score + heatmap to MQTT for the PLC and to a relational table for the audit trail
  • Retrain pipeline: triggers on either a scheduled cadence or a drift alert

Alternatives we evaluated and didn't pick​

  • MVTec's commercial framework — strong but locked into their stack. Fine if the customer already runs MVTec everywhere.
  • Custom PyTorch — the temptation is real but the maintenance cost is brutal once you have more than two cells deployed.
  • ADBench — useful for benchmarking on tabular anomaly data, not relevant for image work.

One opinion​

The thing that determines whether an anomaly framework "works" is not the model implementations — it's whether the dataset abstraction matches your data lifecycle. We've burned more time on data-loading code than on model code, by an order of magnitude.

What are you using? Curious about anyone running a custom stack at scale.
 

Forum statistics

Threads
171
Messages
178
Members
27
Latest member
AIORAli

Members online

No members online now.

Featured content

AIOR
AIOR TEKNOLOJİ

Tüm ihtiyaçlarınız için Teklif alın

Hosting · Domain · Sunucu · Tasarım · Yazılım · Mühendislik · Sektörel Çözümler

Teklif al

7/24 Destek · Anında yanıt

Back
Top