MTTD vs correlation debt: the metric your SIEM doesn't tell you about

Your mean time to detect looks healthy because alerts are firing. Correlation debt is the percentage of those alerts that needed cross-source context the platform couldn't supply — and it is the better predictor of breach cost.

Pull any SOC scorecard from the last five years and MTTD is on it. Mean time to detect: the headline metric, the one that goes up in the board pack, the one your platform vendor proudly puts a sub-minute number against. Most enterprises sit somewhere between 40 seconds and 3 minutes. Reasonable. Defensible. Looks great in a quarterly review.

And yet breach cost keeps climbing. Industry reports put average breach cost in the 4-5 million USD band, with median dwell time for actually successful intrusions still measured in weeks. If detection is so fast, why is the long tail still so long?

The answer is that MTTD is measuring the wrong thing. It measures the time from event-to-alert. It tells you the platform was alive. What it doesn't tell you is whether that alert was useful — whether an analyst could action it without context from another tool, another team, another tab.

The number you actually want is correlation debt. It is the percentage of alerts in a given window where the platform could not, by itself, supply the cross-source context needed for the analyst to triage. Said differently: how often did your SOC have to do the platform's job because the platform couldn't.

What correlation debt actually measures

Pick a representative week. Pull every triaged alert. For each one, look at the investigation trail. If the analyst's notes contain references to data that wasn't in the alert payload — "checked EDR for the parent process", "pulled identity log to confirm the SSO source", "asked network team for the egress IP" — that alert carried correlation debt. The platform raised an event but did not assemble the context.

The ratio that matters is:

Correlation debt = (alerts requiring out-of-platform context lookups) / (total triaged alerts), measured weekly, segmented by alert source.

A healthy SOC sits in the 5-15% range. A SOC carrying serious debt sits at 60-85%, which is roughly what we see at most pre-migration assessments. At the upper end, almost every meaningful alert is a stub that the analyst has to flesh out by hand. That work doesn't show up in MTTD because MTTD stops the clock when the alert appears.

The dual-metric rule: publish MTTD and correlation debt on the same dashboard. If MTTD is sub-minute but correlation debt is above 50%, your fast alerts are arriving naked. Your real time-to-action is hidden in analyst notes.

Why MTTD alone is a vanity metric

MTTD has three structural problems that nobody talks about, because the number is convenient.

It rewards firing, not understanding

A correlation rule that says "alert when failed_logins > 10 in 60 seconds" will fire fast. It will also fire on benign automation, on misconfigured monitors, on the build pipeline. MTTD says: brilliant, sub-minute detection. Reality says: the analyst will close it as noise in another six minutes, having spent that six minutes pivoting through four tools to prove it's noise. The platform got credit for being fast. The analyst paid the cost.

It hides the analyst's enrichment tax

Every time an analyst opens a second tab, they are paying down correlation debt that the platform booked. Multiply that across 200 alerts a day and you have, depending on the SOC, between 4 and 11 person-hours per shift of work that looks like triage but is really data-assembly. MTTD never sees that time. It is invisible to the metric by design.

It cannot distinguish detection quality from detection volume

Two SOCs can both report 90-second MTTD. SOC A fires 1,200 alerts a day, of which 80 are true positives, and analysts close the rest with a copy-pasted "no impact" note. SOC B fires 95 alerts a day, of which 60 are true positives, each delivered with full path context. SOC B is dramatically more effective. MTTD does not show this. Correlation debt does — and the precision metric does too, but precision without a context measure misses the half that drives analyst fatigue.

Measuring it without retrofitting your stack

You do not need a graph platform to measure correlation debt. You need a discipline. Two weeks of effort buys you the baseline.

Instrument the case notes. Add a mandatory closure field with three checkboxes: "I needed data from another platform", "I needed data from another team", "the alert was actionable as-shipped". Treat the third as the inverse of debt.
Sample, don't audit everything. Pull 50 random alerts per week per analyst tier. You'll get a stable signal inside a month.
Segment by alert source. The endpoint detection family will probably look fine. The identity-and-network combinations will look terrible. That gap is where the migration ROI lives.
Re-baseline quarterly. Correlation debt drifts. New detections, new integrations, new attackers. A 12% debt ratio in March will not be 12% in June without active maintenance.

The instrumentation does not require new tooling. Most SOAR platforms can hold those checkboxes. Most ticketing systems can be coerced into it. The hard part is cultural: getting the analyst to admit when an alert was unhelpful. Some lead analysts will read that as a critique of their work. It is not. It is a critique of the platform's work.

Why correlation debt predicts breach cost

Three reasons, in increasing order of unpleasantness.

One. Debt-heavy SOCs miss multi-stage attacks. If every alert is single-source, the kill chain doesn't stitch. An identity event in tool A, a lateral-movement event in tool B, and a data-staging event in tool C never become one incident — they become three closed tickets, often by three different analysts on three different shifts. The attacker walks between them.

Two. Debt-heavy SOCs fatigue out their seniors. The L2/L3 tier becomes a human correlation engine. When they leave — and they do, because the work is exhausting — the institutional knowledge of which tool to pivot to in which order leaves with them. The L1 escalation rate spikes. The MTTD doesn't move.

Three. Debt-heavy SOCs cannot do retrospective detection. If a new threat-intel hit lands today and you want to know whether your fleet was touched 60 days ago, the answer depends on whether the data can be re-correlated. In a debt-heavy platform, the answer is "we'd need a six-week project". By the time the project finishes, the regulator has already called. We've written separately about why retrospective detection is the quietly overlooked superpower — the connective tissue is correlation debt.

Profile	MTTD	Correlation debt	Observed dwell of successful intrusions
Healthy graph-native SOC	30-90s	5-15%	Median 1-3 days
Mature legacy SIEM SOC	30-120s	40-60%	Median 8-21 days
Stitched-tool SOC (heavy debt)	30-180s	60-85%	Median 30-90+ days

The numbers are illustrative, drawn from our own assessment data across roughly forty SOC deployments. The pattern is consistent: MTTD doesn't vary much across the three profiles. Correlation debt and dwell move together.

What to do about it

The instinctive response is to write more correlation rules in the existing SIEM. This works up to a point and then collapses. Each new rule pulls from more sources, query latency degrades, the rule engine times out, and the team ends up with a Rules Council that meets fortnightly to argue about precedence. We have watched this happen at least a dozen times. It is a stage of SIEM life-cycle, not a SOC failing.

The structural fix is to stop expressing correlation as joins-at-query-time and start expressing it as relationships-at-ingest-time. When an authentication event arrives, the identity it relates to, the device it came from, and the network path it traversed are linked as edges at the moment of ingest. Every subsequent detection traverses pre-existing relationships. The "correlation" disappears as a separate step because there is no separate step.

Honest caveat: moving to a substrate that links at ingest is not free. The migration takes between 8 and 16 weeks for a mid-sized enterprise. We document the discipline of it in the closed-loop detection engineering whitepaper. If you cannot commit to that, do the discipline part — instrument the debt metric — and at least know what number you are carrying.

One more practical note. When you publish correlation debt to leadership, do not show only the aggregate. Show it segmented by alert source pair: identity × endpoint, network × identity, cloud × identity, and so on. The pairs with high debt are the migration roadmap. The pairs with low debt are areas you don't need to touch. This stops the conversation from becoming "rip and replace" and turns it into "fix the seams".

A short recap, because the metric is the point

MTTD measures aliveness. Correlation debt measures usefulness. The first is necessary, the second is decisive. If you publish only the first, leadership will get the impression the SOC is healthier than it is. If you publish both, the conversation about platform spend becomes a conversation about real outcomes — analyst time, dwell time, breach blast radius — instead of about latency on a graph.

Key takeaways

MTTD measures event-to-alert latency. It says nothing about whether the alert was actionable.
Correlation debt is the percentage of triaged alerts that required out-of-platform context to action — measure it weekly, segmented by alert source pair.
Healthy: 5-15%. Mature legacy: 40-60%. Stitched-tool: 60-85%. The last group consistently sees longer real dwell despite identical MTTD.
Instrument with three closure checkboxes. No new tooling required. Two weeks of discipline buys you a baseline.
The structural fix is link-at-ingest, not join-at-query. Until you can commit to that, at least know the number you're carrying.

Next: how a four-person SOC can run Detection-as-Code without standing up a dedicated platform engineering team. Spoiler — the trick is to make the pipeline the platform's responsibility, not the SOC's.