Air-gapped OT/IT SOC across four plants

A manufacturer running 3,800 OT devices across four plants with no internet egress at the OT layer built a fully air-gapped OT/IT SOC on one graph. MTTD on the OT side moved from "eventually" to four minutes; cyber-insurance premium dropped 18%; MeitY guideline alignment was confirmed in a second-party audit.

Customer profile. Sector: Manufacturing (chemicals and auto-parts; mixed continuous-process and discrete-manufacturing). Size: four plants — two chemicals, two auto-parts. Approximately 3,800 OT devices in total across the four sites: PLCs, HMIs, SCADA stations, historian servers, and a small population of safety-instrumented systems. Each plant has a separate IT estate with roughly 700 endpoints. Geography: Gujarat (chemicals, two plants), Tamil Nadu (auto-parts, one plant), Maharashtra (auto-parts, one plant). Volume: 22 GB/day of aggregated security telemetry across the four sites; OT-side telemetry is the majority by message count but a small fraction of byte volume. Replaced systems: no SOC at the OT layer; IT SOC was a cloud-SaaS SIEM requiring internet egress (a regulatory and operational red flag); plant SCADA logs had never been centrally analysed.

This is the engagement we found hardest to scope correctly. Air-gapped industrial SOCs are not a smaller version of a regular SOC; they are a different operating model. The IT-side instinct — push everything to a cloud SIEM and call the vendor at 2am — does not survive contact with a plant network that, by design, has no path out. The deployment described here ran for fourteen weeks across four sites. The customer's engineering team was unusually patient with the planning phase, which is the only reason the rollout was as smooth as it was. Most of the lessons below are organisational, not technical.

The reason to publish this case study is that "air-gap-ready" is overclaimed by most security vendors. The honest test is whether the platform's threat intel, model invocations, content updates and operational tooling can all run without any internet egress, in perpetuity, with a defensible patching and content-update story. Most platforms fail that test the moment you ask. The deployment below is the operating answer to what passing that test actually looks like.

Before Netgraph

The pre-state was not unusual for an Indian industrial group. The IT side had moderate maturity; the OT side had effectively none. Specifically:

No SOC at the OT layer. The plant teams were process and reliability engineers, not security analysts. They monitored OT for safety and availability, not for adversary activity. Logs from SCADA, PLCs, historian servers and the engineering-workstation tier were either not retained, retained locally without analysis, or retained on the asset itself and never reviewed.
IT SOC used a cloud-SaaS SIEM that required egress. The IT-side SIEM was a popular cloud-delivered product. It collected logs from each plant's IT estate over a VPN to the corporate data centre and from there to the cloud SIEM tenant. The arrangement worked for IT but had two problems: the OT network could not, by policy, send anything to a cloud-resident system, so OT was invisible; and the regulator's tightening guidance was beginning to flag the IT-side egress dependence as a sovereignty concern even for the IT estate.
Three OT-side incidents in 24 months that went unnoticed for weeks. One was a misconfigured engineering workstation that had been infected via a contractor USB and was beaconing to an external IP for eleven days before the plant network team noticed the anomalous outbound on the firewall log (which had been collected but not analysed in real time). The second was a configuration push from a maintenance laptop that introduced a subtle PLC programming change; the change was correct functionally but inconsistent with the change-control register, and it was discovered three weeks later during an unrelated audit. The third was a credential-reuse incident from a contractor's account that touched two engineering workstations over a month before anyone noticed the cross-asset access pattern. None of the three resulted in safety incidents. All three could have.
MeitY guidelines tightening. The Ministry of Electronics and Information Technology's guidance for critical-sector and process-industry cybersecurity was on its second public consultation, and the direction of travel was clear: sovereign control of telemetry, demonstrable detection capability at the OT layer, and incident-response evidence packs aligned with the national CERT-In timelines. The customer's CISO had a roughly nine-month window to be visibly compliant ahead of an expected audit.
Insurance premium rising. The cyber-insurance renewal was up 27% year-on-year, partly because of a general market hardening and partly because the underwriter's questionnaire had specific questions about OT-side detection and incident response that the customer could not answer honestly. The underwriter offered a discount tier for "demonstrable OT/IT integrated detection" but the customer did not qualify for it.

The customer's leadership had spent the previous twelve months evaluating two adjacent options. The first was a specialist OT-monitoring product that would run on-premise in each plant. The second was extending the existing cloud SIEM to ingest OT telemetry through a mediated proxy. The OT product was rejected because it solved one half of the problem (OT visibility) but created a new operational silo with no link to IT context. The cloud-SIEM extension was rejected because the OT engineering team would not approve any path that sent OT-side data to a cloud destination, regardless of the proxy architecture. The customer concluded — correctly — that the only viable answer was an on-premise, fully air-gapped SOC that spanned both IT and OT in a single graph.

Why Netgraph

Five reasons were decisive.

Fully air-gap-ready, end to end. Netgraph's deployment model permits a complete air-gap. No part of the platform — including the model layer used for the RCA agent and other analytical workflows — requires internet access at runtime. Threat-intel feeds, content updates, model weight updates and software updates are all distributed through an internal mirror service that the customer operates. This is the test most "air-gap" claims fail. The OT-engineering team's lead spent two days in the technical workshop trying to identify any outbound call that would be made by the platform in steady state. He found none.
Internal mirror service for threat intel and content. Netgraph ships an internal mirror that the customer hosts inside their own perimeter. The mirror pulls signed content updates from the Netgraph publisher in scheduled, manually-triggered windows; the air-gapped clusters consume only from the mirror. The customer controls the cadence, retains the artefacts, and can audit the chain of custody. This is the operational answer to "how do you update an air-gapped platform without compromising the air-gap".
In-cluster LLM for agent reasoning. The agentic capabilities of Netgraph — RCA agent, detection-drafting agent, evidence-pack composition — run on an in-cluster model. The model weights are part of the air-gapped distribution. No prompts, no completions, no telemetry leave the cluster. The customer's appetite for "agentic" workflows was initially zero on the grounds that LLMs require egress; the in-cluster model changed that conversation. By the end of the pilot the same engineering team was authoring custom agent prompts.
OT protocol parsing as first-class. Netgraph ships parsers for the major industrial protocols — Modbus, OPC-UA, DNP3, IEC 61850, S7, EtherNet/IP — and represents OT-specific entities (PLCs, function blocks, tag namespaces, batch processes) as typed nodes in the graph. This is the difference between "we can ingest OT logs" and "we can reason about OT". Detection patterns can be expressed in OT terms — a function-block write outside a maintenance window, a configuration push from a non-engineering workstation, a cross-zone connection that violates the Purdue model — and not as low-level packet predicates.
Single graph spanning IT and OT. The most important property, operationally. An engineering-workstation node has edges to both its IT identity (the user signed in) and its OT scope (the PLCs it can program). A contractor identity node has edges to its IT-side accounts and to the OT-side privileges those accounts confer. The cross-domain detection patterns — "contractor account signs in on IT, then within fifteen minutes the linked engineering workstation pushes a configuration change to a PLC" — are expressible as single graph queries. No bridge product. No correlation across two consoles. One traversal.

The deciding moment in the evaluation was a tabletop in week six. The customer's CISO described one of the three historical incidents (the contractor credential-reuse case) and asked whether the Netgraph schema could have represented and detected it. The implementation team drafted the corresponding graph pattern live in the workshop, in roughly forty minutes, ingested two weeks of replayed log data into a sandbox, and showed the pattern firing on the historical incident — twelve days before the customer had originally noticed it. That demonstration ended the vendor selection.

Implementation

The fourteen-week rollout was deliberately conservative. Air-gapped industrial deployments do not benefit from speed; they benefit from auditability. Each phase produced an artefact for the OT engineering review board to sign off before the next phase began. The phased timeline is below.

Phase	Weeks	Scope	Exit criteria
1 — Internal mirror and reference cluster	0 to 3	Internal mirror service stood up inside the customer's corporate perimeter, configured to pull signed content from the Netgraph publisher on a weekly window with manual approval. Reference cluster built in the customer's primary data centre, fully isolated, fed only by the mirror. Bootstrap content set loaded.	Mirror operational; reference cluster green; ten consecutive content-update cycles completed without external dependency.
2 — Plant 1 rollout (Gujarat, chemicals)	3 to 7	Air-gapped cluster deployed at the first chemicals plant. IT telemetry onboarded. OT telemetry onboarded via the protocol parsers, with collectors deployed at the OT-DMZ boundary and never inside the safety-critical zones. Plant-specific OT detection content authored, including PLC programming-anomaly patterns and Purdue-model zone-violation patterns. Parallel run with the legacy cloud SIEM for IT-side data continues; OT-side has no prior baseline.	Cluster operational. OT-side detections firing on synthetic incidents. Plant engineering review board signs off.
3 — Plant 2 rollout (Gujarat, chemicals)	5 to 9	Second chemicals plant onboarded using the template from plant 1. Asset onboarding accelerates significantly: most OT device types are now in the connector library. The detection-content baseline established at plant 1 is reused, with plant-specific overrides.	Cluster operational. Detection-content delta from plant 1 documented and reviewed by the OT engineering board.
4 — Plant 3 rollout (Tamil Nadu, auto-parts)	7 to 11	First auto-parts plant onboarded. Discrete-manufacturing telemetry (different protocol mix, more frequent configuration changes, faster cadence) requires new detection content. Plant-team training delivered in parallel.	Cluster operational. Auto-parts-specific detection pack signed off by the OT engineering board.
5 — Plant 4 rollout (Maharashtra, auto-parts)	10 to 13	Final auto-parts plant onboarded using the auto-parts template. Cross-plant detection patterns enabled where the customer's network topology and policy permit (e.g. detecting the same compromised contractor identity moving across plants).	All four clusters operational. Cross-plant patterns validated.
6 — Adversary-emulation, MeitY audit prep, operational handover	12 to 14	Adversary-emulation library exercised against all four plants: ICS-focused TTP catalogue runs, contractor-credential-reuse replay, configuration-push-out-of-window replay, USB-borne-malware-on-engineering-workstation replay. RCA agent produces draft detections from the closed runs. MeitY guideline alignment evidence pack compiled. Second-party audit conducted.	Second-party audit confirms MeitY alignment. RCA-drafted detections reviewed and validated. Operational runbook signed off.

Three operational notes for any team planning a similar deployment.

The internal mirror cadence is a policy decision, not a technical one. The customer chose a weekly content-update window with manual approval. That is conservative; some industrial customers run a daily mirror cycle. The trade-off is between content freshness and the auditability of the approval process. Either model works on the platform; the customer's risk appetite determines which.

The OT-DMZ boundary is the only place collectors go. No Netgraph component runs inside a Purdue Level 0, 1 or 2 zone. Telemetry from the deep OT layers is mirrored to a passive collector at the OT-DMZ, where the parsing and graph projection happens. This is the only architecture the plant engineering boards will approve, and it is the only architecture we recommend.

The first plant is slow; the rest are fast. Plant 1 took four weeks of focused work. Plants 2 through 4 each took roughly half that, because the connector library, the detection-content baseline, the runbook patterns and the audit artefacts were already in place. Anyone planning a multi-plant rollout should budget the first plant generously and expect the rest to compress. Quoting a fixed per-plant timeline at the start of the engagement is a mistake we have seen others make.

Outcomes

The numbers below cover the four-week baseline (which, for OT-side metrics, is essentially "no baseline existed") and a six-week steady-state window after the final plant cutover.

Metric	Before	After	Change
Air-gapped clusters in production	0	4 (one per plant, no internet egress)	New capability
OT-side detection coverage	None	11 detection rules drafted by the RCA agent, validated against the adversary-emulation library	New capability
MTTD on OT-side incidents	"Eventually" (weeks, in three historical cases)	4 minutes (median, BAS-replay)	Order-of-magnitude reduction
MTTD on IT-side incidents	22 minutes	52 seconds	−96%
MTTR on cross-domain (IT→OT) patterns	n/a (not previously detected)	26 minutes (median)	New capability
Internet egress from OT layer	De-facto compliant, undocumented	Documented zero-egress; verified in audit	Compliance posture
Cyber-insurance premium (annual)	Index 100 (+27% YoY)	Index 82	−18% (from increased base)
MeitY guideline alignment	Self-assessed gaps in 9 of 14 control areas	Second-party audit clear on all 14	Audit cleared
CERT-In 6-hour readiness for OT-side incidents	Not operational	Workflow live, dry-run twice on BAS incidents	New capability
Cross-plant analyst capability	None (each plant siloed)	One unified view across four plants for the central security team	New capability

The MTTD numbers warrant care in interpretation. The OT-side "4 minutes" is a median from adversary-emulation replays — the platform's measured time from telemetry arrival to graph-pattern match for the eleven OT detection patterns. It is not a steady-state observation of a real adversary, and we do not claim it as one. The honest comparison is that prior to the deployment, the three historical OT-side incidents went undetected for between eleven and twenty-three days each. The new minimum-detection latency is four minutes. Whether real adversaries trigger the patterns at that latency is something only time and incidents will tell.

The insurance premium reduction was confirmed by the underwriter's renewal letter, which cited "demonstrable OT/IT integrated detection with documented zero-egress at the OT layer" as the qualifying basis. The underlying programme — second-party audit clearance, MeitY alignment, operational evidence packs — is what produced the discount; the platform's job was to make that programme defensible.

For twenty years the plant network was a place we monitored for uptime, never for adversaries. That assumption was wrong, and the three incidents we now know about prove it. The change is not that we bought a new tool. The change is that the OT engineers and the SOC analysts now share a screen and a vocabulary. That happens because they share a graph.

— Chief Information Security Officer, four-plant manufacturing group

Key results

Four air-gapped Netgraph clusters in production — one per plant — with no internet egress, verified by audit.
OT-side MTTD moved from "eventually" (weeks, in three historical cases) to a measured four minutes on adversary-emulation replays. Eleven OT-specific detections drafted by the RCA agent and validated against the emulation library.
Cross-domain IT-to-OT patterns are expressible as single graph queries and detected in median 26 minutes — a capability the customer previously did not have at any latency.
MeitY guideline alignment confirmed in second-party audit; CERT-In 6-hour readiness extended to OT-side incidents.
Cyber-insurance premium reduced 18% from a renewal that was on track to increase 27%; the underwriter cited the OT/IT integrated detection programme by name.
The internal mirror service operates an air-gapped content-and-model update cycle that has run for ten cycles without any external dependency at runtime.

What's next

Three workstreams are queued for the coming six to nine months.

Safety-system overlay. The deployment so far instruments the engineering, maintenance and operational layers of the OT estate. The safety-instrumented systems (SIS) — the layer of last-resort process protection — remain outside the graph by design. The customer and the platform team are now planning a read-only overlay that brings SIS state into the graph for monitoring purposes only, with no write path and no participation in automated response. The goal is to make SIS health visible to the SOC without exposing the SIS layer to any operational risk. This is a sensitive piece of work and is being scoped carefully with the OT engineering board.

Supplier and contractor identity graph. Two of the three historical incidents involved contractor identities. The customer is extending the identity graph to include named contractor populations, their access windows, their approved engineering workstations and their access scopes. Detection patterns that flag access outside agreed windows or beyond agreed scope will be authored against this enriched identity model. Early signal suggests this will be one of the highest-yield detection categories on the OT side.

Cross-plant adversary-emulation cadence. The customer is moving to a quarterly adversary-emulation exercise on each plant, with a rotating thematic focus — supply-chain compromise, contractor abuse, ransomware on engineering workstations, configuration-push abuse. The exercises will use a shared TTP catalogue with plant-specific variations. The output of each quarter will be a measurable detection-coverage delta and a closed-loop set of refined detection patterns. The intent is to make adversary-emulation a routine, not a project.

The under-appreciated lesson from this deployment is that air-gapped does not mean "less capable, slower, harder to update". With a properly designed internal mirror, an in-cluster model and a content-signing chain of custody, an air-gapped industrial SOC can run at parity with its connected siblings on every operational metric that matters — detection latency, content freshness, agent assistance, audit readiness. The trade-off is operational discipline, not capability. For a customer whose plant networks should not, must not and now do not talk to the internet, that trade-off is exactly the right one.

Sample case study. Composite of representative deployments; customer details anonymised.

Air-gapped OT/IT SOC across four plants

Before Netgraph

Why Netgraph

Implementation

Outcomes

Key results

What's next

Continue reading

Air-gapped does not mean less capable

Air-gapped deployment architecture

Why the graph is the product, not a feature