The IoT gateway: where most security and reliability problems get fixed (or shipped)

Aior · Thursday at 11:40 PM

Why the gateway is the critical layer

Sensors generate data. The cloud receives data. Between them sits the gateway, and the gateway is where 90 % of the operational and security work happens. Get the gateway right and the rest of the IoT system is mostly tractable. Get it wrong and you're firefighting per-device problems forever.

The pattern below is what we've converged on.

What a gateway actually does

Protocol bridging — Modbus RTU on the OT side, MQTT on the IT side. Or LoRa on the field side, HTTPS on the cloud side.
Edge filtering & buffering — drop noisy samples, hold data when uplink is down, push when restored.
Local logic — alarms that need to fire faster than the cloud round-trip allows.
Security boundary — the place where OT-side trust ends and IT-side authentication begins.
Firmware / model distribution — pulling updates from the cloud and pushing to local devices.
Diagnostics — the place that has a complete view of "is the system actually working".

That's a lot of jobs for one device. Don't put them in five different gateways at one site, ever.

Hardware

We default to:

CM4-based industrial carrier (Revolution Pi, Compulab, or an in-house carrier) — when the gateway is doing real work, including edge inference or rich analytics.
Industrial x86 mini-PC (Logic Supply, AAEON) — when the gateway needs to run Windows-only software (rare) or when the customer's IT team specifies x86.
ESP32 / STM32-based gateway — for the simple "convert RS-485 to MQTT" use case. Beware: simple turns into complex within a year.

The temptation to use a Raspberry Pi consumer board is real. Don't. Industrial enclosures, eMMC storage, proper power input, and a real RTC are the difference between a deployment and a maintenance burden.

Software architecture

The pattern we ship:

Linux (Debian or Yocto-based) base
Containerised application stack (Docker or Podman)
A protocol-bridge service (e.g. Telegraf, in-house Go binary, Node-RED for prototypes that should not be left in prod)
Local time-series buffer (TimescaleDB-on-edge or DuckDB) for offline operation
An MQTT client (or HTTPS) for cloud uplink
A local management agent for OTA updates and diagnostics

Service supervision via systemd. Logs forwarded via journald + Promtail to central Loki. Metrics via Prometheus node-exporter + Telegraf to central Prometheus. The boring observability stack pays off dramatically.

Security checklist (non-negotiable)

No default passwords. Per-device generated.
TLS for any uplink. mTLS where the cloud platform supports it.
Disk encryption (LUKS) — the gateway might walk away.
Network segmentation — gateway has interfaces on both OT and IT VLANs, but no IP forwarding between them. The gateway is a proxy, not a router.
No SSH from anywhere except an explicit jumphost. Public-key only.
Signed firmware updates with rollback capability.
Vendor-supplied default services (FTP, telnet, embedded web servers) all disabled.

Offline operation — the test you must do

Disconnect the gateway's uplink and observe behaviour for 24 hours, then 7 days. The questions:

Does it keep collecting data? (It must.)
Does the local buffer fill correctly?
When uplink returns, does it backfill correctly without overwhelming the cloud?
Does it surface "I am offline" status visibly?
Does the local logic that needs to keep running, keep running?

A gateway that works on the bench with a permanent uplink and falls over when the customer's site has a 3-hour outage is not done.

One thing we'd warn about

Node-RED in production. It's a wonderful prototyping tool. It is not a production application platform. Every gateway we've inherited that ran a customer's "MVP" Node-RED flow grew to be a tangled, undebuggable system. Use it to prove the architecture, then rebuild the production gateway in a real language.

The handover

At handover, the customer should be able to:

See gateway health on a dashboard
Push a firmware update to one or all gateways without engineering involvement
Read the last 30 days of buffered data from any single gateway
Trigger a remote restart from the dashboard

If those four things aren't possible, the gateway isn't deployable.

What does your gateway stack look like? Curious whether anyone is replacing Linux gateways with embedded RTOS for stricter determinism.

The IoT gateway: where most security and reliability problems get fixed (or shipped)

The IoT gateway: where most security and reliability problems get fixed (or shipped)

Aior

Administrator

Why the gateway is the critical layer

What a gateway actually does

Hardware

Software architecture

Security checklist (non-negotiable)

Offline operation — the test you must do

One thing we'd warn about

The handover

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Legal Notice

We value your privacy

The IoT gateway: where most security and reliability problems get fixed (or shipped)

The IoT gateway: where most security and reliability problems get fixed (or shipped)

Aior

Administrator

Why the gateway is the critical layer​

What a gateway actually does​

Hardware​

Software architecture​

Security checklist (non-negotiable)​

Offline operation — the test you must do​

One thing we'd warn about​

The handover​

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Tüm ihtiyaçlarınız için Teklif alın

Legal Notice

We value your privacy

Why the gateway is the critical layer

What a gateway actually does

Hardware

Software architecture

Security checklist (non-negotiable)

Offline operation — the test you must do

One thing we'd warn about

The handover