Beyond MVTec AD: collecting your own anomaly dataset that survives reality

Aior · Thursday at 11:34 PM

MVTec AD is a benchmark, not a dataset

Every anomaly detection paper tops out near 99 % image-AUROC on MVTec AD. That number is the reason teams confidently deploy a model and then watch it fail in production. MVTec AD is small (~5k images), pristine (lab lighting, clean backgrounds, single object), and curated (anomalies are visible to a human in <1 s). Your factory floor is none of those things.

If you're collecting your own dataset, here's what we've learned the hard way.

Class imbalance is the whole problem

Anomalies are rare by definition. A line that produces 1 % defect rate gives you, in a typical week, a hundred or so anomalies and tens of thousands of good parts. This isn't a "balance the loss" problem. It's a "you don't have enough anomalies for supervised learning, ever" problem. Hence unsupervised methods.

But: that 1 % includes maybe twenty distinct defect types. If you train on the data you have, you'll cover the common defects fine and miss the rare ones — the rare ones being, of course, the most expensive ones to miss.

The cold start problem

On day one of a project you have no good images and no anomaly images. Two weeks of data collection later, you have a few hundred good images and zero confirmed anomalies. The decision: deploy a "good only" anomaly detector now and find out what it flags, or wait until a couple of confirmed anomalies show up?

We've converged on: deploy in shadow mode at the end of week 2. Use the operator's manual rejections as anomaly labels. Don't trust the labels until you've reviewed them.

Active learning loops that actually work

Run inference on every part. Log score + image.
Human reviewer queues: highest-score good parts (potential false rejects), lowest-score bad parts (potential false accepts).
Operator labels in <30 s per image, in a UI built for it. Not a spreadsheet.
Daily delta: 50-100 new labels, weekly retrain on the cumulative set.

This is the pattern that took our worst-performing project from 92 % to 99.4 % image-AUROC over six weeks of production. No new model architecture; just a better dataset.

Synthetic anomalies (CutPaste, DRAEM)

A surprisingly strong tool. The trick: paste random crops from the same image (CutPaste) or simulate Perlin-noise-driven structural anomalies (DRAEM-style). The model learns "this region is statistically inconsistent" rather than "this looks like the anomalies I've seen". Generalises better to unseen defect types than naive supervised approaches.

We don't ship synthetic-only models. We ship models trained on real good samples + synthetic perturbations.

Things to actually capture, beyond the image

Camera ID, lens config, lighting state — different cameras drift differently
Shift, operator ID, line speed — operator-driven variance is a real signal
Upstream process variables (temperature, pressure) when available — sometimes the anomaly is upstream
Material lot — different supplier batches look different to the camera

These are the columns that let you debug a regression three months later instead of staring at a confusion matrix.

One last thing

Don't compress your training images. Lossy JPEG compression hides exactly the kind of low-amplitude defects you're trying to detect. Keep the raw PNGs in cold storage, downsample at training time if the model needs it.

What's your dataset cadence? Weekly retrain, monthly, on-demand only?

Beyond MVTec AD: collecting your own anomaly dataset that survives reality

Beyond MVTec AD: collecting your own anomaly dataset that survives reality

Aior

Administrator

MVTec AD is a benchmark, not a dataset

Class imbalance is the whole problem

The cold start problem

Active learning loops that actually work

Synthetic anomalies (CutPaste, DRAEM)

Things to actually capture, beyond the image

One last thing

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Legal Notice

We value your privacy

Beyond MVTec AD: collecting your own anomaly dataset that survives reality

Beyond MVTec AD: collecting your own anomaly dataset that survives reality

Aior

Administrator

MVTec AD is a benchmark, not a dataset​

Class imbalance is the whole problem​

The cold start problem​

Active learning loops that actually work​

Synthetic anomalies (CutPaste, DRAEM)​

Things to actually capture, beyond the image​

One last thing​

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Tüm ihtiyaçlarınız için Teklif alın

Legal Notice

We value your privacy

MVTec AD is a benchmark, not a dataset

Class imbalance is the whole problem

The cold start problem

Active learning loops that actually work

Synthetic anomalies (CutPaste, DRAEM)

Things to actually capture, beyond the image

One last thing