Buying "AI SOC" in 2026 is harder than buying a SIEM was in 2016. Back then the product categories were honest — log collector, correlation engine, case manager. Today every vendor in adjacent territory has rebadged itself as an AI SOC, and the term covers at least three architectures that have almost nothing in common under the hood. We have helped roughly forty enterprise buyers run structured evaluations over the last eighteen months. This post is the scoring framework we lean on, distilled to something you can drop into a spreadsheet on a Monday morning.

We will frame the three contenders, give you seven evaluation axes that actually predict three-year outcomes, and explain why we have stopped recommending overlays for any greenfield programme. If you only remember one thing: overlays are tactical, graph-native is strategic. The rest is detail.

The three architectures, named honestly

It helps to ignore the marketing names and look at where the AI actually lives in the stack. Each architecture below has a real role; the mistake is buying one believing it is another.

1. AI overlays on the legacy stack

An overlay is software that sits beside your existing log pipeline and endpoint platform and adds language-model-driven triage, summarisation, or natural-language query. The substrate — the data store, the detection content, the index — is still the legacy SIEM. The overlay's strength is speed to value: it ships in weeks, your existing engineers keep their muscle memory, and your ingest contract does not change. Its weakness is structural. The model is reasoning over a flat event store. It cannot see the asset graph, the identity graph, or the blast radius. It does what it can with what it sees, and what it sees is a haystack.

2. Point AI-SOC products

Point products take ownership of triage, often pulling a copy of alerts from your existing tools into their own pipeline, applying their own models, and producing prioritised incidents. They are usually cloud-hosted, often opinionated about the data they accept, and almost always priced per analyst seat or per "investigation". They work well as a workforce-multiplier on a stable stack. They struggle when the underlying telemetry is noisy or when sovereignty constraints prevent log shipping. Crucially, the substrate is still external to the product — the AI is doing inference, but the ground truth lives elsewhere.

3. Graph-native unified platforms

Graph-native platforms collapse the substrate itself. Ingestion, detection, correlation, hunt, response and reporting all run against a single property-graph representation of the estate — assets, identities, processes, network flows, vulnerabilities, controls. The AI is not bolted on; it walks the graph the same way an analyst would. The trade is real: graph-native demands more architectural commitment upfront and more discipline about what you ingest. The pay-off is that retrospective hunts, blast-radius questions, and exposure-to-incident joins become single queries rather than ticket-bouncing exercises.

The litmus test. Ask the vendor: "Can I, from a single query, see every identity that touched a host with a critical exposure in the last 90 days, including hosts that were retired?" If the answer needs three tools, two exports and a Jupyter notebook, it is not graph-native. It is an overlay with a graph view.

Seven axes that actually predict three-year outcomes

Most RFPs evaluate features. Features change every quarter; architecture does not. We score vendors on seven axes — each scored 1 to 5 — and weight them by the buyer's specific constraints. Sovereignty-bound buyers (BFSI, defence, healthcare in regulated geographies) weight the last three heavily. Cloud-native digital businesses weight the first three.

Axis 1: Substrate depth

How much of the security data model does the vendor own? A vendor that owns ingestion, normalisation, the data lake, the graph and the detection runtime has structural advantages a re-seller of those layers does not. We score 5 when the vendor controls the full path from raw event to graph edge; 3 when they own the analytics layer but rely on a third-party lake; 1 when they are a thin UI over your existing SIEM.

Axis 2: Data ownership and locality

Where does your telemetry physically live, and who can subpoena the operator? This is no longer an India-only question — every jurisdiction with a serious data-protection law now asks it. Score 5 for a deployment topology that allows the entire control plane and data plane to run inside your tenancy, on hardware you can audit. Score 1 for "multi-tenant SaaS only, region X or Y". Anything in between needs careful reading of the contract.

Axis 3: Agent governance

AI agents that act on your behalf are now in production at enough buyers that the auditors have caught up. The questions to ask: are the agent's actions logged with the same immutability as analyst actions? Are there scoped blast-radius gates? Can the agent be rolled back? Is there a human-approval ladder for destructive actions? Score 5 for a model where every agent decision lands in an append-only audit log with replay; 1 for a chat box that issues API calls without a paper trail.

Axis 4: Retrospective capability

Detections are written for what we know now. Three-quarters of the value of a real platform shows up six months later, when a new threat-intel report drops and you need to ask "did this happen to us, ever, anywhere?" Overlays usually answer this with a 24-hour hot tier and a 90-day cold tier you cannot search interactively. Graph-native platforms answer it with a single query against the full retained corpus. Score 5 when interactive retro-hunts over a full year of telemetry return in seconds; 1 when you need to re-hydrate cold storage to a separate cluster.

Axis 5: Total stack count

Count the distinct products the buyer must license to reach feature parity. An overlay sits on top of (typically) a SIEM, an EDR, a SOAR, a UEBA, a TIP and a case manager. A graph-native platform tends to consolidate four to six of those into one substrate. Stack count is not just a cost story — it is the single best predictor of mean-time-to-investigate, because every additional product adds a swivel-chair tax. Score by inverse count: 5 for one or two products, 1 for six or more.

Axis 6: Effective cost per GB

"Cost per GB" is a slippery number because vendors hide it behind seats, EPS, "analyst hours saved" and bundled bands. Reduce every quote to a single normalised figure: (annual list price + estimated egress + ops headcount delta) / annual ingest in GB. Include both hot and cold tiers. We routinely see overlays whose effective cost lands 4-7x above graph-native platforms once the underlying SIEM bill is in the denominator. Score by quartile against the buyer's anchor benchmark.

Axis 7: Sovereignty fit

Independent of where data lives, sovereignty also means: can you operate the platform if the vendor's cloud control plane is unreachable? Can you patch and update from a local mirror? Can you run the platform on hardware procured under your own tendering rules? Score 5 for a deployment that runs fully on-prem with offline corpus updates and no phone-home; 1 for a SaaS-only model with mandatory egress.

The scoring sheet

Below is the consolidated scoring table we share with CISOs, with the typical scores we observe per architecture. These are population averages across the buyers we have helped — your individual vendors will vary.

Axis AI overlay Point AI-SOC Graph-native
Substrate depth1-22-34-5
Data ownership / locality2-31-24-5
Agent governance22-34
Retrospective capability1-225
Total stack count (inverse)124-5
Effective cost per GB1-22-34
Sovereignty fit21-25

The numbers tell you the obvious story; the framework is in how you weight them. A pre-IPO SaaS company with no regulatory anchor will rationally weight Axes 1, 3 and 5 heavily, and may live happily with an overlay for two more years. A regulated bank with a sovereignty mandate will reach a different conclusion before lunch.

Where the framework changes your shortlist

Three patterns show up across every evaluation we have run:

  • Overlays score well in years one and two, then collapse in year three. The model the buyer is using to value the overlay (incremental analyst productivity) gets dwarfed by the rising bill of the underlying SIEM and the unchanged stack count. Buyers who do not re-evaluate at year three end up paying twice for the same workflow.
  • Point AI-SOC products are excellent compression layers, terrible substrates. They concentrate the workflow but do not change what is below. If your SOC is already efficient and your underlying telemetry is clean, the gain is real. If your underlying telemetry is noisy, you have an expensive way to triage noise.
  • Graph-native platforms ask more of the buyer. Onboarding requires modelling discipline — which entities are first-class, what relationships matter, how identity is reconciled. The platforms we work with answer that with opinionated defaults so the modelling is not greenfield, but a buyer who skips the modelling conversation in week one will be unhappy in month three.
"We replaced two SIEMs, one UEBA and a TIP with one graph-native platform. The first quarter was harder than I expected. The second quarter we closed the SOC analyst headcount gap we had been carrying for two years." — Head of SOC, mid-cap private bank, January 2026.

How we would buy in 2026

If we were buying for a regulated Indian enterprise in 2026, we would build the shortlist around graph-native platforms first and use the overlays as fallbacks if the modelling work could not be sequenced. We would weight sovereignty, retrospective capability and substrate depth at 0.2 each, and the rest at 0.1. We would insist on a four-week PoV against the buyer's own telemetry — not a curated dataset — and we would judge the PoV on three concrete questions:

# 1. Blast-radius query
"Show me every identity, host, and downstream system reachable from
 ticket INC-4421, with the controls that should have stopped the
 lateral movement, ordered by data-classification."

# 2. Retrospective hunt
"Find every authentication from a residential ASN to a production
 service over the last 365 days, grouped by service criticality."

# 3. Closed-loop validation
"Show me the detection rule that fired on INC-4421, the unit-test
 that covers the underlying TTP, and the last time we exercised
 that test in a purple-team run."

If the vendor can answer those three in a single tool, you are looking at a graph-native platform. If they need three tools and an analyst, you are looking at an overlay.

Key takeaways

  • "AI SOC" is now a marketing umbrella over three architectures with very different three-year economics.
  • Score on substrate depth, data ownership, agent governance, retrospective capability, stack count, cost per GB, and sovereignty fit — features change every quarter, architecture does not.
  • Overlays buy you 12-18 months of triage relief at the cost of carrying the same stack. Use them as bridges, not destinations.
  • Point AI products compress workflow on top of clean telemetry. They do not fix dirty telemetry.
  • Graph-native platforms ask for modelling discipline in week one and pay it back in months three through thirty-six.
  • Insist on a PoV against your own data with at least one blast-radius, one retrospective and one closed-loop question.

For a deeper look at the substrate question, see our whitepaper on graph-native correlation, and the field write-up from a private bank that retired multiple correlation tools after a graph-native migration.