İçeriğe geç
KAMPANYA

Logo Tasarım + Web Tasarım + 1 Yıl Domain + E-posta + Hosting — $299 +KDV

AIOR

Hailo-8 in production: lessons from shipping eight stations on the same platform

Sektör topluluğu — sorularınız, deneyimleriniz ve duyurularınız için.

Hailo-8 in production: lessons from shipping eight stations on the same platform

Aior

Administrator
Staff member
Joined
Apr 2, 2023
Messages
175
Reaction score
2
Points
18
Age
40
Location
Turkey
Website
aior.com
1/3
Thread owner
500


Why Hailo​

We started shipping Hailo-8 accelerators about two years ago, after testing it head-to-head with Jetson Xavier NX on a vision inspection workload. The headline numbers were clear: comparable inference performance at roughly a quarter of the power, with a much smaller thermal envelope. After eight production stations, here's what we know that's not in the marketing material.

The toolchain workflow, in practice​

The path from a PyTorch model to a Hailo-deployed binary is:
  1. Train in PyTorch / TensorFlow
  2. Export to ONNX
  3. Optimize with the Hailo Dataflow Compiler (DFC) — this includes quantization to INT8
  4. Compile to a Hailo Executable Format (HEF) targeting the specific chip (Hailo-8 / 8L / 15)
  5. Deploy via HailoRT runtime

Steps 3 and 4 are where the real work happens. The DFC needs a representative calibration dataset — at least 64 images, ideally 512 — captured under production conditions. Calibration is the difference between "almost the same accuracy as FP32" and "embarrassing accuracy regression we explain to the customer".

Quantization sensitivity is real​

Some architectures quantize cleanly. Others don't.
  • ResNet, MobileNet, YOLO families — INT8 with <1 % accuracy regression. No drama.
  • Transformers (ViT, DETR) — sensitive. Often need per-channel quantization, sometimes need partial FP16 retention on attention heads.
  • Anomaly detection (PatchCore, EfficientAD) — distance-based scoring is sensitive to quantization noise. We spent a week recovering 2 % AUROC on EfficientAD with QAT before deciding to keep it on a Jetson Orin Nano instead.

The pragmatic rule: if your model has unusual numerics (cosine similarity in the loss, distance-based scoring, custom layer norms), assume quantization will cost you 1-3 % accuracy and budget for QAT.

Memory & multi-model deployments​

Hailo-8 has 20 MB of on-chip SRAM. A typical YOLOv8s post-quantization is around 12 MB; YOLOv8m is around 25 MB and doesn't fit alone. The chip then "context switches" — loading partial graphs from host RAM — which costs latency.

For multi-model deployments (e.g. detection + classification + OCR on the same chip), HailoRT supports model swapping between frames. It's measurably slower than a single model. We size for single-model where latency matters, multi-model where the use case can tolerate 30-50 ms swap penalties.

Hailo-15 vs Hailo-8 — when to upgrade​

Hailo-15 is the newer SoC-style chip with built-in ISP, video codec, and more compute. We use it when:
  • The cell is space-constrained and we want camera + accelerator on a single board
  • We need >1 stream at production resolution
  • Multi-model deployments stop fitting on Hailo-8

For a single-camera, single-model station, the Hailo-8 M.2 is still the cheapest path.

One thing we'd warn about​

The Hailo ecosystem is excellent if you're building one camera-to-decision pipeline per chip. It is less ergonomic if you're building a heterogeneous data pipeline with 20 transforms and 3 conditional models — for that you want CPU + Hailo, not Hailo alone.

Anyone running Hailo-15 in real cells yet? Curious about the ISP integration story and whether it actually replaces a discrete camera ASIC.
 

Forum statistics

Threads
171
Messages
178
Members
27
Latest member
AIORAli

Members online

No members online now.

Featured content

AIOR
AIOR TEKNOLOJİ

Tüm ihtiyaçlarınız için Teklif alın

Hosting · Domain · Sunucu · Tasarım · Yazılım · Mühendislik · Sektörel Çözümler

Teklif al

7/24 Destek · Anında yanıt

Back
Top