In late 2009, Google's security team discovered that attackers had spent weeks inside its network before any detection fired. Post-incident forensics revealed that the intrusion techniques β spear-phishing, lateral movement via Windows shares, exfiltration over encrypted HTTP β were all known attacker behaviors. No detection rule existed for any of them. The Aurora campaign, targeting at least 34 companies including Adobe and Intel, became the clearest early demonstration that detection engineering had to be systematically coupled to offensive knowledge, not just reactive to past breaches.
Google's public response included the founding of Project Zero, explicitly tasked with finding and publishing vulnerabilities before attackers could exploit them. The underlying principle: offensive research must feed back into defensive posture, continuously and methodically.
A penetration test produces a report. In most organizations, that report enters a ticketing system, gets triaged by severity, and drives patch cycles. This is valuable β but it misses the detection dimension entirely. The attacker's technique is documented; the defensive rule that would catch it is never written.
Detection engineering feedback is the disciplined process of taking each exploited technique from a pentest and asking: what observable evidence did this leave, and how do we write a detection for it? Without this step, the next attacker who uses the same technique will succeed as thoroughly as the pentesters did.
The MITRE ATT&CK framework, first publicly released in 2015, was built directly to solve this problem. ATT&CK is a living registry of adversary techniques observed in real intrusions, organized so that both red teams and detection engineers share a common vocabulary. When a pentest documents "T1059.001 β PowerShell execution," a detection engineer can immediately query: what SIEM rules cover this technique, and what did the pentest find that bypassed them?
When the Shadow Brokers published NSA Equation Group tools in 2016β2017, defenders who had conducted internal pentests using similar SMB exploitation techniques were dramatically better positioned. Organizations with mature feedback loops had already written Sigma rules for EternalBlue-style lateral movement. Those without them scrambled for weeks. The gap between the two populations was not technical sophistication β it was whether pentest findings had been systematically converted into detection logic.
A mature feedback loop has four components. First: technique capture. Each pentest finding is tagged to an ATT&CK technique ID. Not just "lateral movement" but specifically T1021.002 (SMB/Windows Admin Shares) or T1021.006 (Windows Remote Management). Precision at the sub-technique level is what enables precise detection.
Second: evidence mapping. For each technique that succeeded, the pentester documents what artifacts were generated β event log IDs, network flow signatures, process tree anomalies, registry keys touched. This is the raw material from which detection rules are constructed. AI tools now assist this step significantly by querying large corpora of documented technique evidence and surfacing relevant telemetry fields.
Third: coverage gap analysis. The detection team audits whether existing rules would have fired on the documented artifacts. If EternalBlue was used and Windows Event ID 4625 (failed logon) spiked with no alert, that is a documented gap. If Mimikatz ran and no LSASS access alert fired, that is a gap. The list of gaps becomes a prioritized work queue for the detection team.
Fourth: rule authoring and validation. Detection rules are written, tested against the pentest evidence (or in a replay environment), and deployed. Critically, the rules are validated by the red team in a subsequent exercise β confirming they actually fire, and that the attacker cannot trivially bypass them.
Large language models accelerate evidence mapping substantially. Given a technique ID and a target environment (Windows Server 2022, Splunk SIEM), an AI assistant can enumerate likely telemetry sources, draft Sigma rule skeletons, and flag known bypass patterns that might blind a naive first-draft rule. The human engineer provides context and judgment; the AI compresses research time from hours to minutes.
Your red team has completed an engagement and documented three successful attack paths: (1) credential spray against VPN using usernames from LinkedIn, (2) lateral movement via PsExec to three servers, (3) data staged to a cloud storage bucket via rclone. Your SIEM is Splunk with Windows event forwarding and Zeek network logs.
Use the AI assistant to identify ATT&CK technique IDs for each path, enumerate the specific evidence artifacts each technique should generate, and draft coverage gap questions to bring to your detection team.
FIN7, also known as Carbanak, stole over $1 billion from financial institutions across more than 40 countries between 2014 and 2018. Their tradecraft was methodical: spear-phishing with weaponized Office documents, PowerShell for post-exploitation, BITS jobs for persistence, and slow-and-low exfiltration over months. Every one of these techniques had documented detection paths. The FBI's 2018 indictments detailed how FIN7's command-and-control patterns, PowerShell invocations, and scheduled task creation all left event log traces that a properly tuned SIEM would have caught.
The gap was not detection capability β most victim organizations had Splunk or comparable tools. The gap was that no one had written the rules. Detection engineering was treated as a project rather than a continuous discipline. FIN7 exploited the absence of systematic rule development for four years.
Sigma is an open-source detection rule format maintained by the SigmaHQ project. A Sigma rule is YAML-structured, vendor-agnostic, and can be compiled into native queries for Splunk SPL, Elastic DSL, Microsoft Sentinel KQL, Chronicle YARA-L, and a dozen other platforms. Write once, deploy everywhere β with platform-specific tuning.
A Sigma rule has four essential components: title and description (human-readable context), logsource (the category of log the rule applies to β e.g., Windows process creation, network connection), detection logic (the field-value conditions that trigger the rule), and falsepositives (known benign scenarios that match the rule). The last component is critical and frequently underdeveloped, leading to alert fatigue that causes SOC analysts to disable rules entirely.
Elastic Security Labs publishes detection rules derived directly from malware analysis and incident response. Their process mirrors what AI-assisted detection engineering enables: given observed process behavior (e.g., a parent process spawning cmd.exe with encoded arguments), analysts enumerate relevant fields (CommandLine, ParentImage, EncodedCommand), draft detection logic, and enumerate false-positive scenarios. Since 2021, Elastic has open-sourced over 1,400 rules via GitHub, all following this structure. The documented false-positive reasoning in these rules β often two or three sentences β is where the most engineering judgment lives.
Given a pentest artifact β say, a PowerShell invocation observed during the engagement with the exact command line documented β an AI assistant can draft a Sigma rule skeleton in under a minute. The workflow proceeds as follows.
Input: the artifact. Provide the AI with the exact command string, process tree, or network flow observed. "cmd.exe /c powershell.exe -EncodedCommand [base64 string] -ExecutionPolicy Bypass -NonInteractive" is sufficient context for a useful first draft.
AI output: rule skeleton. The model will propose a logsource (Windows process_creation), detection fields (CommandLine contains '-EncodedCommand', Image endswith 'powershell.exe', ParentImage endswith 'cmd.exe'), and a condition. It will often suggest relevant Sysmon Event IDs (Event ID 1 for process creation) and Windows Security Event IDs (4688 if process audit logging is enabled).
Human step: false positive analysis. This is where the engineer's environment knowledge is irreplaceable. Does your organization use SCCM, which invokes PowerShell with encoded commands for software deployment? Does your monitoring stack itself use PowerShell tasks? The AI can enumerate common false positive scenarios from public knowledge, but only the engineer knows whether those scenarios occur in this environment.
Human step: threshold and tuning. A rule that fires on every encoded PowerShell invocation will generate thousands of alerts daily in a large enterprise. The engineer adds conditions: parent process is not SCCM's ccmexec.exe, execution time is outside maintenance windows, or the invocation includes additional suspicious flags like -WindowStyle Hidden.
AI-generated Sigma rules consistently require human review of three things: (1) field name accuracy for the specific SIEM platform β Splunk's field names differ from Elastic's; (2) false positive reasoning specific to the target environment; and (3) rule condition logic for edge cases (case sensitivity, substring vs. exact match). Treating AI output as a first draft, not a finished product, is the operational norm among mature detection teams.
A detection rule is hypothetical until it fires on real evidence. The gold standard for validation is replay testing: executing the exact pentest technique in a staging environment and confirming the rule fires. Teams that lack a replay environment can validate against the pentest's collected log data if that data was preserved and ingested into the SIEM during or after the engagement.
MITRE's open-source tool Atomic Red Team provides scripted, individual technique executions mapped to ATT&CK IDs. After a pentest identifies T1059.001 as an exploited technique, a detection engineer can run the corresponding Atomic test (Atomic Test #1: Mimikatz, or Atomic Test #3: PowerShell encoded command) and observe whether the drafted Sigma rule fires. This closes the loop: pentest finding β artifact documentation β rule draft β validation test β deployed rule.
During a recent engagement, your team observed: a PowerShell process spawning from WScript.exe executing an encoded command, followed by a network connection to a non-standard port (8443) on an external IP. Your SIEM is Microsoft Sentinel with Sysmon event forwarding. Process creation data includes CommandLine, ParentImage, and ParentCommandLine fields.
Use the AI assistant to draft a Sigma rule for the PowerShell spawning behavior, then work through false positive analysis, threshold tuning, and a second rule for the anomalous outbound connection.
The SolarWinds Orion compromise, carried out by UNC2452 (Cozy Bear / APT29) and discovered in December 2020, infected approximately 18,000 organizations including multiple US federal agencies. The attackers maintained access for nine months β in some environments, over a year β before detection. FireEye, which discovered the intrusion, had conducted internal red team exercises and had detection logic for many common techniques. What SolarWinds exposed was the gap in supply-chain-specific detection: legitimate software update mechanisms as delivery vehicles, SAML token forgery for cloud identity abuse, and living-off-the-land binaries that blended with expected behavior.
The subsequent industry response included the creation of CISA's Emergency Directive 21-01 and a wave of purple team exercises specifically targeting supply chain and identity attack paths. Organizations that had conducted structured validation exercises β confirming their SIEM rules actually fired on SAML golden ticket creation and anomalous service principal activity β detected the compromise artifacts faster when the forensic data was reviewed retroactively. Those without validated detection rules found nothing in their logs, even in retrospect.
A purple team exercise is not a red team engagement where the blue team watches. It is a structured, collaborative session where red team operators execute specific techniques while detection engineers observe, in real time, whether their detection logic fires. The red team's job in a purple team is to execute techniques with full transparency β sharing exact command strings, execution timing, and expected artifacts β so that detection validation is unambiguous.
The contrast with a traditional red team engagement is significant. In a red team, the measure of success is whether the blue team detects the attackers. In a purple team, the measure of success is whether the detection rules work. These are different objectives. Purple teaming is fundamentally a detection engineering validation mechanism, not an assessment of blue team capability.
CISA's SILENTSHIELD program, operational since 2022, provides no-cost red team assessments to critical infrastructure operators. A distinguishing feature is the post-engagement purple team phase: CISA operators work directly with agency detection teams to replay each successful technique and confirm whether detection rules would have fired. CISA's 2024 red team assessment report documented that in one assessed organization, 15 of 17 successful techniques generated no SIEM alerts β not because the logs were absent, but because no rules existed for them. The purple team phase resulted in 15 new detection rules written and validated within two weeks of the engagement.
AI assistance has changed three aspects of purple team execution. First: technique variant generation. When a detection rule successfully catches a baseline technique execution, the red team needs to test variants β encoded commands, alternate execution paths, LOLBin substitutions. AI tools can rapidly enumerate known evasion variants for a given technique, providing the red team with a systematic test battery rather than ad-hoc improvisation.
Second: real-time rule refinement. When a rule fails to fire, the detection engineer needs to understand why. Was the field name wrong? Did the attacker's command use a substring the rule's condition missed? AI assistants can analyze the gap between the rule's condition logic and the actual observed artifact, proposing targeted amendments in real time. This compresses the iterate-and-retest cycle from hours to minutes.
Third: documentation automation. Purple team exercises produce significant documentation: which techniques were tested, which rules fired, which were tuned, which new rules were written. AI tools can generate structured exercise reports from session notes, mapping each technique to ATT&CK, documenting rule changes, and producing the before/after coverage assessment that justifies the exercise investment.
VECTR (from SecurityRisk Advisors, open-source since 2017) is the standard platform for tracking purple team results. Each technique execution is logged as a test case: ATT&CK technique ID, exact execution parameters, expected artifact, actual alert outcome (fired / not fired / fired with wrong context). AI assistants can import VECTR export data and produce prioritized gap remediation roadmaps, ranking un-detected techniques by ATT&CK prevalence data and threat intelligence relevance to the organization's sector.
One-time purple team exercises have limited value if detection rules decay β systems change, log sources are modified, rule conditions break silently when field names change across SIEM upgrades. Mature programs schedule quarterly purple team exercises targeting high-priority ATT&CK technique families, with AI-assisted tracking of coverage trend over time.
The FBI and CISA's Joint Cybersecurity Advisories, released roughly monthly, document techniques observed in current campaigns. Each advisory is a prioritized purple team input: run Atomic Red Team tests for every technique in the advisory, confirm detection coverage, file gaps as remediation tickets. AI assistants can parse advisory text, extract technique IDs, and generate the corresponding Atomic test execution plan automatically β reducing advisory-to-validation time from weeks to days.
Your team has drafted a Sigma rule for T1059.001 (PowerShell) that detects the baseline encoded command pattern (-EncodedCommand flag). Before deploying, you want to run a purple team validation covering known bypass variants. You also need to prepare a purple team test battery for T1055.001 (Process Injection via DLL) following a recent engagement where this technique was used successfully.
Use the AI assistant to enumerate PowerShell detection bypass variants, identify which Sigma rule conditions would catch each variant and which would not, and generate a structured VECTR-compatible test case list for both techniques.
Mandiant's annual M-Trends report has tracked attacker dwell time β the median number of days an attacker remains in a network before detection β since 2012. In 2012, median dwell time was 416 days. By 2022, it had fallen to 16 days for organizations with internal detection capabilities. The organizations driving this improvement shared a common characteristic: they had converted detection engineering from a project activity into an operational function with continuous improvement metrics, regular cadence, and feedback integration from both incident response and offensive exercises.
The organizations that remained at multi-hundred-day dwell times were not technologically unsophisticated. They had SIEMs, they had threat intelligence feeds, they conducted annual pentests. What they lacked was the closed feedback loop: pentest findings not converted to rules, IR findings not converted to rules, threat intel not converted to rules. Their detection posture was static while attackers adapted.
MITRE's Detection Engineering Maturity Model (DEMM), published alongside ATT&CK Navigator updates, defines five maturity levels. Level 1: reactive β rules written only after incidents. Level 2: pentest-driven β rules written from engagement findings but without systematic coverage tracking. Level 3: coverage-aware β ATT&CK Navigator heatmaps maintained, coverage gaps tracked as work items. Level 4: continuously validated β purple team exercises on a defined cadence, rule quality metrics tracked, coverage trend measured over time. Level 5: threat-informed β coverage priorities driven by current adversary activity, TI feeds integrated into detection backlog, AI-assisted technique variant enumeration standard practice.
Most organizations that conduct annual pentests operate at Level 2. The jump to Level 3 requires one process change: converting the pentest report's ATT&CK technique list into ATT&CK Navigator annotations, then auditing current detection coverage against those annotations. AI tools can automate this annotation step in minutes.
Microsoft's Detection and Response Team (DART), which responds to hundreds of major incidents annually, publishes its detection logic in the Microsoft Sentinel GitHub repository. The team's documented practice includes post-incident rule authoring for every novel technique observed, structured ATT&CK coverage reviews after each major campaign (Hafnium Exchange exploitation in 2021, DEV-0537 / LAPSUS$ in 2022, Midnight Blizzard in 2023β2024), and AI-assisted parsing of incident telemetry to identify artifact patterns that existing rules missed. Their published detection rules average 4β6 new rules per major campaign, typically within 72 hours of campaign attribution.
Detection programs that improve over time measure two core metrics. ATT&CK coverage percentage: the fraction of technique IDs in the organization's threat model for which at least one validated detection rule exists. This metric, tracked monthly, shows whether the program is keeping pace with technique documentation. A static or declining coverage percentage despite ongoing investment signals that rules are decaying (broken by system changes) as fast as new ones are written.
Mean time to detect (MTTD) for known techniques: measured during purple team exercises and tracked over time. If the baseline PowerShell detection rule fired in 40 seconds in Q1 and fires in 40 seconds in Q3, the rule is stable. If it stopped firing β often discovered only during the Q3 exercise β something changed in the environment (Sysmon version update changed field names, log forwarding pipeline dropped events) and the gap went unnoticed.
AI tools add a third emerging metric: variant coverage ratio. For each ATT&CK technique with a detection rule, what fraction of its documented execution variants does the rule catch? A rule that catches the baseline but misses five of seven documented variants has low variant coverage. AI can maintain this ratio automatically by querying technique documentation and assessing rule conditions against the variant list.
CISA and FBI Joint Cybersecurity Advisories, ISAC threat reports, and vendor-published campaign analyses all contain ATT&CK technique lists for current adversary groups. An AI assistant can parse these documents, extract technique IDs, query the organization's ATT&CK Navigator coverage map, and produce a prioritized gap list in minutes. This converts threat intelligence from a reading exercise into a detection backlog input β a process that previously required a full-time analyst working for days.
The complete continuous improvement process has five steps, each with an AI acceleration point. Step 1 β Technique identification: ATT&CK tagging of pentest findings, IR findings, and threat intel. AI parses reports and extracts technique IDs. Step 2 β Coverage audit: Navigator annotation + gap list. AI compares technique list to current rule inventory. Step 3 β Rule authoring: Sigma drafts from artifact documentation. AI generates first drafts for human review. Step 4 β Validation: Atomic Red Team execution + purple team confirmation. AI enumerates variants for test battery. Step 5 β Rule maintenance: Scheduled re-validation after system changes. AI flags rules whose logsource conditions may have broken due to environment changes.
Organizations at Detection Engineering Maturity Level 4 run this cycle quarterly for high-priority techniques and annually for their full ATT&CK coverage map. AI tooling at each step compresses the total cycle time from months to weeks β making the difference between a program that keeps pace with adversary adaptation and one that perpetually lags.
You are the detection engineering lead for a mid-sized financial services firm. Your ATT&CK Navigator heatmap shows 38% technique coverage. A recent pentest identified 9 ATT&CK techniques across the Initial Access, Execution, Persistence, and Lateral Movement tactics. A new CISA advisory has just been released for a threat group targeting financial sector firms, listing 12 ATT&CK technique IDs β 7 of which overlap with your pentest findings. Your SIEM was upgraded last quarter and two Sysmon EventIDs changed names.
Use the AI assistant to build a prioritized remediation roadmap: which gaps to close first, what rule re-validation is needed post-SIEM upgrade, and how to structure a 90-day sprint to improve coverage from 38% to 55%.