Detection-as-Code without a dedicated platform team

A four-person SOC can run Detection-as-Code — pull requests, regression CI, retro replay before rollout — if the pipeline is the platform's responsibility, not the SOC's. Here is the concrete workflow.

Every detection engineering conference has the same talk. A senior engineer from a tech-forward unicorn walks through their Detection-as-Code pipeline: monorepo, GitOps, automated linting, mock data, replay harness, blue-green deploys of correlation rules. The audience nods. They go home. Nothing changes.

The reason nothing changes is that the talk requires you to have a platform team. Five engineers, a build budget, a release manager, a dedicated detection engineer, an SRE-style on-call rotation for the rule engine. If you have those people, you do not need the talk, because you have already solved the problem. If you don't have those people — and most SOCs in the 4-to-12-analyst band do not — the talk is irrelevant.

This post is for the SOCs that don't. Specifically: a 4-person SOC at a mid-sized enterprise, two L1, one L2, one lead, no dedicated detection engineer, no platform team, no SRE. They want to ship rules through pull requests and CI gates without becoming a software engineering org. It is possible, but only if the platform — not the SOC — owns the pipeline.

The principle: shift the heavy lifting to the platform

The reason platform teams exist in big shops is that the SIEM doesn't natively understand rules-as-files. Rules live in a UI. Someone wrote a sync tool that pushes YAML into the UI's API. Someone else wrote a linter for the YAML. Someone else wrote a CI job that runs the linter. Someone else wrote a replay harness that simulates events against a copy of the rule engine. That is the platform team's job: building the seven-piece scaffolding to turn an unfriendly product into a friendly one.

If the product is friendly to begin with, the scaffolding is the vendor's problem, not yours. Concretely, the platform should ship:

Detection content as version-controlled files in a known DSL (graph patterns, signatures, ML thresholds) — not as opaque UI configs.
A native CLI that validates, lints, and dry-runs content against a sample event corpus.
A managed replay environment. You point it at a date range; it streams historical data through the candidate rule and reports hits, false positives, and latency budget.
A safe-deploy mechanism — staging, canary, full — without the SOC having to operate a separate environment.
Webhooks into the SOC's existing source control. No bespoke runners.

If any one of these is missing, you are back to needing engineers. With all five in place, the SOC can run the whole loop with the people they already have, because the SOC's job becomes writing and reviewing content, not building and operating a release pipeline.

The buying signal: when evaluating a detection platform, ask to see the CI runbook. If the vendor walks you through their own GitHub Actions workflow and the diffs are small, you're fine. If they hand you a 40-page implementation guide for "setting up your detection pipeline", that guide is the work they didn't do.

The four-person workflow

Here is what a working week looks like for a 4-person SOC running Detection-as-Code. We have set this up in roughly fifteen deployments. The roles flex; the workflow doesn't.

Monday: the proposal

An L1 sees a recurring false positive — the badge system is generating "impossible travel" alerts because the badge clock drifts. They open a draft pull request in the detection repo. The change is a one-line addition to the exclusion list, plus a comment explaining the badge-system context. They do not need to know how the rule is deployed. They need to know how to write the change.

# detections/identity/impossible-travel.ngd
exclusions:
  source_ids:
    - identity-provider-prod
    - vpn-corp-mumbai
    - badge-system-blr          # added 2026-01-05 — clock drift
                                # tracked under ticket SOC-4421
threshold:
  velocity_kmh: 900

The PR triggers a CI job. The job is owned by the platform, not by the SOC. It does four things automatically: validates the syntax, replays the rule against the last 14 days of hot data, compares the result to the current production rule, and writes a comment on the PR showing the delta. If false positives drop and true positives are unaffected, the CI gate is green.

Wednesday: the review

The lead reads the PR. They look at the replay delta, not at the rule. The delta is the thing that matters: "this change removes 312 alerts in 14 days, retains 4 true positives". They either approve or ask for the replay window to be extended. They are not reviewing code; they are reviewing outcomes.

If the PR is a brand-new detection rather than an exclusion, the gate is stiffer. The CI is configured to require:

A 14-day replay against hot data with a maximum predicted noise rate (configurable, usually 50/day per rule for an L1-eligible alert).
A 90-day replay against warm data, returning aggregate hit counts. This catches "your rule fires on month-end batch jobs" before production does.
An adversary-emulation step: the rule has to fire against at least one entry in the platform's emulation library that matches the documented technique. No emulation hit, no merge. This is the part that catches "I wrote a rule but I don't actually know what it detects."

Thursday: the canary

Merge to main triggers a canary deploy. The new rule runs alongside the current production rule for 48 hours. Both write to the same alert queue, but the canary's alerts are tagged. If the canary's alert rate matches the replay prediction within 25%, the platform auto-promotes the rule to full. If it diverges, the deployment pauses and a comment appears on the originating PR.

The SOC does not run the canary infrastructure. The canary is a platform feature. The SOC sees a Slack message: "rule X promoted" or "rule X paused, predicted 4/day, observed 47/day, investigate".

Friday: the retro

One hour, four people. They walk through the PRs that merged this week. The discussion is short because the data is already in the PRs. Anything that needed manual intervention becomes a note: "the badge-clock case took two iterations; we should write a generic time-skew exclusion template". That note becomes next week's first PR. The loop closes.

What goes wrong, and what to do about it

This workflow is not theoretical. We have watched it succeed and fail. The failure modes cluster around three patterns.

Failure mode	Symptom	Fix
Replay is too slow	PR-to-merge takes 4+ hours; reviewers context-switch	Push the replay onto the platform's cold-tier replay engine. If the platform can only replay against hot, the loop is broken; raise it with the vendor.
Reviewers review code, not deltas	PRs accumulate; nitpick comments on rule syntax	Train the lead to read the CI comment first. If the delta is good, syntax bike-shedding is wasted time.
Emulation library is stale	New rule passes emulation but misses real adversary variants	Subscribe to a refresh feed; treat emulation content as code too — it gets PRs.
Canary runs too short	Rules promote during off-hours, fire heavily during business hours	Set canary windows to cover at least one full business cycle (48h spans a weekday and either a weekend day or a deploy window).

The first one is the killer. If your replay is slow, the whole loop collapses, because the L1 will not wait two hours for CI to come back; they will start working on something else, the PR will go stale, and after a month the team will quietly stop using the workflow. Replay speed is the single most important pipeline characteristic to evaluate. We benchmark replays at 1 hour of historical traffic per 30 seconds of wall-clock — anything slower than that and the loop starts feeling expensive.

The smell test: if your team has stopped using the PR flow and is editing rules directly in a UI "just for today", the pipeline is too slow or the gates are too strict. Find which one, and fix it before the workaround becomes permanent.

Why a small SOC gets more value than a big one

This is the counterintuitive part. Detection-as-Code is usually pitched as a maturity model: you graduate to it when you are big enough. We disagree. A small SOC gets more value per analyst hour than a large one, for the simple reason that a small SOC cannot afford to debug a misbehaving rule at 3 a.m.

In a 4-person SOC, every alert burst caused by a bad rule lands directly on the on-call. There is no L2 buffer, no "let's circle back tomorrow" — tomorrow is the same person. The cost of a 1% noise increase is a measurable percentage of someone's sleep. Detection-as-Code with regression gates is what lets a small team ship changes without that cost. The big SOC can absorb a bad rule; the small SOC cannot.

The other underrated benefit is institutional memory. When a 4-person SOC turns over a single L1, they have lost 25% of their knowledge base. If detection logic lives in PR descriptions, reviewer comments, and replay deltas — i.e. as written prose tied to data — the new L1 can read the history and catch up. If detection logic lives in a UI somebody clicked, the new L1 inherits a haunted forest. We've watched both scenarios. The repo-with-history is dramatically kinder to the team.

The minimum viable setup

If you want to start tomorrow, here is the smallest version that works:

A single repo. detections/, exclusions/, tests/ — three directories.
The platform's CLI installed on a CI runner you already pay for.
Three CI gates: lint, 14-day hot replay, emulation-hit. That's it for v1.
One Slack channel for CI output. Read it during the daily standup.
A standing 60-minute Friday retro on PRs merged that week.

That setup gets a 4-person SOC 80% of the value. The remaining 20% — long-tail replay against cold data, automated tuning suggestions, drift detection on emulation coverage — is what the closed-loop detection engineering whitepaper goes into in depth. For most teams, 80% is plenty for the first six months.

If you are running an MSSP with multi-tenant detection content, this gets more complicated. We've sketched the tenant-isolation pattern in the India MSSP case study — short version: each tenant has a fork that auto-merges from the upstream content repo, with a per-tenant exclusion overlay.

Key takeaways

Detection-as-Code does not require a platform team if the platform itself ships the pipeline scaffolding — versioned content, native CLI, managed replay, canary deploys, source-control webhooks.
The SOC's job is to write content and review deltas, not to build runners. If you find yourself building runners, the platform is unfriendly.
Replay speed is the single most important pipeline characteristic. 1 hour of traffic per 30s of wall-clock or it gets abandoned.
Small SOCs benefit more, not less. The cost of a bad rule lands directly on the on-call, so regression gates protect sleep, not just SLAs.
Minimum viable setup: one repo, three CI gates, Slack output, weekly retro. Add complexity only when you've outgrown the simple version.

Next time we go regulatory, not technical: the DPDP Act 72-hour notification clock and how to actually operate it as a SOC runbook.

Detection-as-Code without a dedicated platform team

The principle: shift the heavy lifting to the platform

The four-person workflow

Monday: the proposal

Wednesday: the review

Thursday: the canary

Friday: the retro

What goes wrong, and what to do about it

Why a small SOC gets more value than a big one

The minimum viable setup

Key takeaways

Continue reading

Closed-loop detection engineering

Retrospective detection: the quietly overlooked superpower

India MSSP, multi-tenant detections