Traceability without rebuilding the entire factory: practical genealogy patterns

Aior · Thursday at 11:50 PM

Why traceability projects are hard

The customer-facing version of traceability sounds simple: "given a finished serial number, tell me what raw lots, what equipment, what operators, and what process parameters were involved". The data model is a directed acyclic graph. The math is trivial. The implementation is hard because it requires every step in the manufacturing process to know what it's working on, and most factories don't have that wiring on day one.

Below are the patterns that make this tractable.

The data model that scales

At the core:

Item — an instance, with a unique ID. Could be a raw lot, an intermediate sub-assembly, or a finished good.
Operation — a process step that consumed inputs and produced outputs. Has equipment, operator, time, and parameters.
Genealogy edge — links an output item to its input items, via an operation.

That's it. The schema scales from a 5-step assembly line to an automotive supply chain. Avoid the temptation to add 30 fields to "item" to capture everything; use the operation's parameter dictionary instead.

Identification at every step — the harder part

Genealogy needs every part to be identifiable at every step. Options:

Pre-marked at part creation — laser-etched serial, ideally human-readable + 2D code (Data Matrix, QR). Best, when the part allows it.
Carrier-tracked — the part rides in a fixture / pallet that has its own ID. The line tracks "fixture X is at station Y", and the part is implicit.
Camera-identified — OCR of a printed/etched code, image-based identification of unique features. Expensive but viable.
Inferred from sequence — "the third part of the day on line 4". Brittle, last resort.

Most working systems are a hybrid. The trick is to design the hybrid deliberately, not by accident.

The "where does the ID get applied" question

This is the decision that shapes the whole project. The earlier in the process you can apply a permanent ID, the more steps you can trace cleanly. But "early" usually means "before the part has a flat surface to mark", which means the marking happens at a partial-state. Resolve this with the customer's product team before designing the data flow.

Storage architecture

Hot store — last 90 days, full detail, queryable in <1 s. PostgreSQL is fine at most factory scales.
Warm store — last 2 years, full detail, queryable in seconds. Same database, different partition / table.
Cold store — beyond 2 years, archived to object storage. Query path requires a job, not a UI click.
Compliance retention — some industries (medical, aerospace) require 10+ year retention. Plan archive paths from day one.

A good rule: query a 6-month-old part's full genealogy in under 2 seconds. If you can't, the system isn't done.

The "incident query" — the test of a real traceability system

The customer calls. "We had a field failure on serial X. Tell us every part with the same root cause." The query needs:

All items with the same upstream raw lot
All items processed through the same equipment within a defined time window
All items processed by the same operator on the same shift
All items where a specific process parameter was outside a defined range

If the system can answer those four queries in the time it takes the support team to get the customer back on the phone, the system is doing its job. We've seen "traceability" systems that can't answer any of them in under a day.

Backward genealogy is harder than forward[/HEADING>
"Given a raw lot, find all finished goods that contain it" requires traversing the genealogy graph in the opposite direction from what most schemas index. Plan the indexes for both directions; otherwise the recall query takes days for an indexable answer.

One pattern that always pays off

Snapshot the entire process state into the genealogy at the time of the operation. Don't trust that "you can look up the recipe by version number" — recipes get edited, calibrations drift, software changes. The snapshot is the contract.

One thing we'd warn about

"Lightweight" traceability built on Excel + spreadsheet macros + a folder share. It works for 1 line, fails for 5, becomes unrecoverable at 20. Build the data model right from the start, even if the UI is minimal.

What's your traceability stack? And anyone using a graph database (Neo4j) for genealogy at scale?

Traceability without rebuilding the entire factory: practical genealogy patterns

Traceability without rebuilding the entire factory: practical genealogy patterns

Aior

Administrator

Why traceability projects are hard

The data model that scales

Identification at every step — the harder part

The "where does the ID get applied" question

Storage architecture

The "incident query" — the test of a real traceability system

One pattern that always pays off

One thing we'd warn about

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Legal Notice

We value your privacy

Traceability without rebuilding the entire factory: practical genealogy patterns

Traceability without rebuilding the entire factory: practical genealogy patterns

Aior

Administrator

Why traceability projects are hard​

The data model that scales​

Identification at every step — the harder part​

The "where does the ID get applied" question​

Storage architecture​

The "incident query" — the test of a real traceability system​

One pattern that always pays off​

One thing we'd warn about​

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Tüm ihtiyaçlarınız için Teklif alın

Legal Notice

We value your privacy

Why traceability projects are hard

The data model that scales

Identification at every step — the harder part

The "where does the ID get applied" question

Storage architecture

The "incident query" — the test of a real traceability system

One pattern that always pays off

One thing we'd warn about