Session 1 of 8

Network Pentesting Foundations Refresher

Scoping, ROE, and the modern internal-vs-external network test before adding AI

⏱ 90 minutes

Learning Objectives

Articulate the critical components of a network pentest scope document and what gaps in scope create AI-specific risks
Distinguish internal, external, and hybrid engagement models and how each changes AI tool selection
Explain how rules of engagement constrain automated and AI-assisted testing phases
Identify the evidence quality requirements that must be preserved even when AI accelerates data collection

Session Overview

This opening session grounds students in the foundational methodology before any AI tooling is introduced. The goal is to expose the gaps that typically exist between how network pentests are scoped and how AI tools actually behave — so practitioners understand from the start that AI augments methodology rather than replacing it.

Walk through a realistic scope document, highlighting the clauses that matter most when automated or AI-assisted tooling enters the picture: rate limiting, discovery exclusions, data handling requirements, and evidence retention obligations. Spend time on the difference between a well-scoped internal test and a vague "network assessment" — students should leave able to push back on scopes that would make AI use legally or methodologically risky.

Key Teaching Points

Scope defines legality, not just courtesy. AI tools can generate traffic patterns and query volumes that exceed what the client authorized without the tester realizing it. Every AI-assisted phase needs an explicit scope check before execution.
ROE clauses that predate AI are commonly silent on LLM-assisted analysis. Clients who approved "automated scanning" often did not contemplate sending banner strings and config snippets to an external LLM. Data handling must be addressed explicitly.
Internal vs. external changes the attack surface, and both change with AI. External tests benefit from AI-assisted OSINT correlation; internal tests benefit more from AI reasoning over network topology data and AD graphs.
Evidence quality is non-negotiable regardless of speed. AI can help you find more, faster — but every finding still requires reproducible evidence. Establish evidence standards before the first scan runs.
The modern network is hybrid by default. Students must stop thinking in perimeter terms. A "network test" now routinely touches on-prem AD, Azure AD Connect, cloud workloads, and VPN split-tunneling — AI tooling must be scoped for each layer.
Kick-off meetings are where AI constraints get negotiated. Brief students on how to raise AI tool use with clients — framing it as methodology transparency, not a red flag.

Discussion Prompts

Your client's scope says "no automated scanners against the production payment environment." You want to use an LLM to analyze banner data collected manually. Does that violate scope? How do you find out?
How would you write a rules-of-engagement clause that explicitly addresses AI-assisted analysis — what should it cover and what should it exclude?
A colleague argues that AI tools just speed up analysis, so the same ROE applies. Where does that reasoning break down?
What evidence would you need to preserve to prove a finding was reached through AI-assisted reasoning rather than direct exploitation?

Instructor Notes

Open with a brief show-of-hands poll: who has added any AI tool — even ChatGPT — to a network pentest in the last year? Use this to calibrate where the room is. If most hands go up, you can move faster through the methodology refresher and spend more time on the AI-specific scope gaps. Bring a sanitized scope document from a real engagement (redacted) and walk through it line by line — abstract examples land poorly with practitioners. Emphasize that this session's goal is to surface the questions they should be asking before any AI tool runs, not to answer all of them yet.

Timing Guide

Introduction — 10 minCourse overview, student calibration poll, session goals

Core Content — 45 minScope anatomy, ROE review, internal vs. external models, AI-specific gaps

Discussion — 25 minScope document walkthrough + group discussion prompts

Wrap-up — 10 minKey takeaways, preview of Session 2, questions

Session 2 of 8

AI-Assisted Service Identification

From port scan to confident protocol identification using LLMs and embedding-based matching

⏱ 90 minutes

Learning Objectives

Explain how LLMs can interpret ambiguous service banners more accurately than static signature databases
Describe embedding-based similarity matching and its role in identifying non-standard or obfuscated services
Apply a structured prompt design process for reliable AI-assisted service classification
Identify failure modes — hallucinated service versions, stale training data — and build verification steps into the workflow

Session Overview

Service identification has always been part science, part art. Nmap's service version detection and banner grabbing work well against standard deployments but fail against custom banners, non-default ports, and obfuscated services that defenders increasingly use. This session teaches practitioners to extend their identification pipeline with LLMs that can reason over ambiguous banner text and match partial signatures to known services.

The practical focus is on building a workflow: scan, collect banners, feed banners to a structured LLM prompt, receive a classification with confidence, verify independently. Students should leave with a template prompt they can adapt immediately and a clear mental model of where LLMs outperform static tools and where they introduce risk that must be managed.

Key Teaching Points

Banners are natural language and LLMs are exceptionally good at natural language. A banner like "SSH-2.0-OpenSSH_for_Windows_7.7" carries version, OS, and deployment context that an LLM can parse and cross-reference far faster than a human reviewing hundreds of hosts.
Embedding-based matching closes the gap for zero-signature services. When a banner matches nothing in nmap-service-probes, embedding similarity can surface near-matches from known service behavior databases — teach students to build a simple retrieval pipeline.
Prompt structure determines reliability. Unstructured queries produce unstructured, unreliable output. Students should use a consistent schema: banner input → requested fields (service, version, confidence, caveats) → output format. Structured output modes in modern APIs eliminate parsing ambiguity.
LLMs hallucinate version numbers. This is the highest-risk failure mode. An LLM confidently asserting a specific CVE-relevant version from a vague banner is a hypothesis, not a finding. Every AI-suggested version needs direct verification — banner confirmation, HTTP header extraction, or a safe probe response.
AI identification accelerates triage, not confirmation. The workflow is: AI generates a high-confidence shortlist → human verifier confirms each entry before it goes into the findings list. Never let AI output flow directly into a report without a human gate.

Discussion Prompts

You receive a banner that reads "Micro Focus Server 8.1." You send it to an LLM, which identifies it as a known vulnerable Micro Focus NetIQ version. What do you do before treating this as a confirmed finding?
How would you design a prompt to minimize hallucinated version numbers while still extracting useful service metadata?
A client has 4,000 hosts in scope. Walk through how you would structure an AI-assisted triage pipeline to surface the highest-priority targets within the first hour of scanning.
What happens when an LLM correctly identifies a service but that service is running on a host that was supposed to be excluded from scope? How does AI change the accidental out-of-scope discovery problem?

Instructor Notes

Live demonstration is high-value here. If the room has internet access, show a real banner string being fed to a structured LLM prompt — use a deliberately ambiguous banner to illustrate both the power and the hallucination risk. If you cannot demo live, prepare 3–4 banner examples with LLM output screenshots and walk through the verification steps manually. Emphasize that this workflow does not require a custom model — commercial API access to a capable frontier model is sufficient. Allocate time for students to draft their own prompt template; even a rough draft gives them something to take back and refine.

Timing Guide

Introduction — 8 minRecap Session 1, frame the service ID problem, session goals

Core Content — 50 minBanner parsing theory, embedding matching, prompt design, failure modes

Discussion — 22 minLive demo or screenshot walkthrough + discussion prompts

Wrap-up — 10 minPrompt template exercise recap, key takeaways, preview Session 3

Session 3 of 8

Vulnerability Mapping at Scale

Correlating versions and configs against CVE/exploit data with AI without trusting it blindly

⏱ 90 minutes

Learning Objectives

Build an AI-assisted pipeline that correlates identified service versions against CVE databases and public exploit repositories
Apply criticality scoring criteria that account for AI confidence levels, not just CVSS scores
Recognize when AI-reported vulnerability associations are stale, incorrect, or out of training window
Design verification checkpoints that prevent unvalidated AI output from reaching client deliverables

Session Overview

Once services are identified, the next question is which ones are exploitable and in what order they should be pursued. Manually correlating hundreds of service/version pairs against NVD, ExploitDB, and vendor advisories takes hours. AI can compress that into minutes — but the stakes of an error are high. A false positive wastes client time and damages credibility; a false negative means a critical vulnerability goes unexamined.

This session teaches a structured pipeline: structured service list in → AI-assisted CVE correlation → confidence-weighted triage output → mandatory human verification per entry → prioritized attack queue. The emphasis is on building the verification layer, not just the AI query layer. Students will also cover the specific failure modes unique to vulnerability correlation: training cutoff gaps, version range ambiguity, and CPE matching errors.

Key Teaching Points

LLMs have training cutoffs; CVEs do not stop at that date. Any vulnerability disclosed after the model's training cutoff will not be surfaced. Students must supplement AI correlation with a live NVD API query or a current scanner for recent CVEs.
CPE string construction is where AI errors cluster. The Common Platform Enumeration format is precise; LLMs frequently introduce vendor name variations or version range syntax errors that cause mismatches. Validate CPE strings programmatically before querying NVD.
AI is excellent at exploit-chain reasoning, not just individual CVEs. A powerful use case is asking an LLM to reason about which combination of lower-severity vulnerabilities on the same host creates a critical-severity attack chain — something static scanners cannot do.
CVSS alone is not enough; exploitability in context matters. Teach students to prompt AI for contextual exploitability factors: is there a public PoC, does the network segment make this reachable, does the client have compensating controls? AI can reason over these factors quickly.
Every vulnerability in the report needs a human who verified it. AI generates the candidate list; a human closes the loop. Implement a sign-off field in the tracking sheet that cannot be left blank.
Config-based vulnerabilities need a different pipeline. AI can read and interpret configuration files (Apache, nginx, IIS, firewall rulesets) for misconfiguration patterns more reliably than version-based CVE correlation — this is often a higher-value use case.

Discussion Prompts

An AI pipeline flags Apache 2.4.49 on 47 hosts as critical (CVE-2021-41773). How do you verify all 47 before the end of the day without losing the efficiency gain AI provided?
The LLM does not mention a critical CVE you know was published two months ago. What does this tell you and what do you do about it in your workflow?
A client's web server is running a patched version of a vulnerable package, but their load balancer config strips security headers. How would you prompt an LLM to identify this kind of compensating-control-nullified finding?
How do you communicate to a client that AI was used in vulnerability correlation without undermining their confidence in the findings?

Instructor Notes

The training cutoff issue is the most important practical point in this session — hammer it early and return to it during discussion. A useful exercise: give students a list of five CVEs and ask them to guess which ones an LLM would and would not know about based on its stated training cutoff. This builds the habit of cutoff-awareness. For the config-based vulnerability point, a short demo reading an nginx.conf for security header gaps is fast and immediately recognizable to most practitioners. Remind students that the pipeline described here is not hypothetical — it can be built in an afternoon with Python, a CVE API key, and commercial LLM access.

Timing Guide

Introduction — 8 minRecap Session 2, frame the scale problem, session goals

Core Content — 50 minPipeline design, CPE errors, exploit-chain reasoning, config analysis

Discussion — 22 minScenario walkthroughs + discussion prompts

Wrap-up — 10 minKey takeaways, verification checklist review, preview Session 4

Session 4 of 8

Active Directory Attack Paths

Using AI to reason over BloodHound output, ACL chains, and Kerberos relationships

⏱ 90 minutes

Learning Objectives

Explain how to export BloodHound graph data in a format suitable for LLM reasoning
Use structured AI prompts to identify non-obvious attack paths through ACL chains and group nesting
Describe Kerberos delegation types and how AI can surface misconfigured delegation relationships at scale
Prioritize attack paths by exploitability and business impact using AI-assisted reasoning, not just shortest-path algorithms

Session Overview

Active Directory remains the crown jewel target in most internal network engagements, and BloodHound has transformed how practitioners visualize attack paths. The problem is that BloodHound's shortest-path algorithm optimizes for graph hops, not real-world exploitability. An AI reasoning layer on top of BloodHound output can identify attack paths that are longer in hops but vastly more reliable or stealthy — and can explain them in language that maps directly to report findings.

This session covers the mechanics of extracting BloodHound data (Cypher query outputs, JSON exports) and feeding it to an LLM in structured form, prompt strategies for eliciting attack path reasoning rather than raw data summarization, and how to handle the size constraints that arise when an AD environment has tens of thousands of objects. Kerberos delegation misconfigurations get dedicated attention as one of the highest-value AI-assisted discovery areas.

Key Teaching Points

BloodHound's Cypher output is LLM-readable with light preprocessing. Export shortest-path and custom Cypher query results as JSON; strip display properties and send only the relationship graph. Most frontier LLMs can reason over AD graphs with hundreds of nodes.
AI excels at multi-hop ACL chain analysis. A WriteDACL edge from Group A to Group B to a Domain Admin account may be invisible in a simple BloodHound query but obvious to an LLM walking the full relationship graph. Prompt specifically for "non-obvious privilege escalation chains involving ACL relationships."
Kerberos delegation is complex enough that AI genuinely helps. Unconstrained, constrained, and resource-based constrained delegation each have distinct exploitation requirements. AI can categorize every delegation-enabled account and generate prioritized exploitation notes per type faster than manual review.
AI can reason about business context, not just technical hops. If you tell the LLM which groups map to which business functions, it can identify attack paths that lead to high-business-impact targets — finance, HR, executive — not just Domain Admin.
Chunking is necessary for large environments. For AD environments with more than a few thousand objects, send the graph in topological layers: start from the current compromise position, expand one hop, analyze, then expand further. Avoid sending the full BloodHound dataset in one prompt.

Discussion Prompts

BloodHound shows 14 shortest paths to Domain Admin. You send the graph to an LLM and it highlights a 22-hop path it considers more reliable. How do you evaluate whether the LLM's reasoning is correct before pursuing that path?
A client AD environment has 85,000 objects. How do you decide what to send to the LLM, in what order, and how do you avoid missing critical paths by chunking?
How would you write a prompt that asks an LLM to distinguish between attack paths that require user interaction and those that are fully automated?
Your AI-assisted analysis finds a path through a service account configured with unconstrained delegation. The account is used by a business-critical application. How does that context affect how you report and remediate the finding?

Instructor Notes

If your training environment has a BloodHound-populated lab (e.g., BadBlood or a similar synthetic AD), this session benefits enormously from a live demo. Export a Cypher query result, walk through the preprocessing step, and show the LLM reasoning in real time. If not, prepare a sanitized JSON snippet from a real engagement and walk through it. The Kerberos delegation section is where most practitioners learn something new regardless of experience level — slow down here. Students often underestimate how much of their current manual AD review time can be handed to an LLM without sacrificing accuracy.

Timing Guide

Introduction — 8 minRecap Session 3, BloodHound baseline, session goals

Core Content — 52 minBloodHound export, ACL chain prompting, Kerberos delegation, chunking strategy

Discussion — 20 minGraph reasoning exercise + discussion prompts

Wrap-up — 10 minKey takeaways, AD attack path summary template, preview Session 5

Session 5 of 8

Lateral Movement and Persistence Planning

Planning quiet, observable, defender-friendly movement through hybrid environments

⏱ 90 minutes

Learning Objectives

Use AI to generate stealthy lateral movement plans that account for network segmentation, EDR coverage, and logging density
Explain how to incorporate threat intelligence about defender tooling into AI-assisted movement planning
Design AI-assisted persistence strategies that are scoped to engagement objectives rather than maximized for attacker advantage
Apply the principle of defender-friendly testing: using AI to model what defenders would see, not just what attackers would do

Session Overview

Lateral movement planning is where the gap between "I found vulnerabilities" and "I demonstrated impact" gets bridged. Traditional planning relies on individual practitioner knowledge of available techniques, the target environment, and what defenders are likely to detect. AI changes this by enabling rapid simulation of movement options, detection likelihood, and impact chains — but it requires careful framing to avoid generating a plan optimized for stealth against the client's defenders rather than for safely demonstrating attacker capability.

This session teaches the discipline of defender-aware movement planning: using AI to reason about what Sysmon, CrowdStrike, or Defender for Endpoint would log at each step, so the test surfaces real detection gaps rather than simply evading them. Persistence planning is covered in the context of engagement objectives — planting a minimally capable implant to simulate dwell time, rather than a maximally capable backdoor.

Key Teaching Points

AI can reason about detection surfaces if you provide the right context. Tell the LLM what EDR platform is deployed, what log forwarding is configured, and what SIEM rules you know are active — it can then estimate detection likelihood per technique and recommend less-detectable alternatives.
MITRE ATT&CK technique selection is a natural AI reasoning task. Given a starting position and an objective, an LLM can enumerate applicable techniques from ATT&CK, score them against the environment context, and generate a prioritized movement plan in minutes rather than hours.
Stealth and evidence quality are in tension. Maximally stealthy movement leaves less evidence for the pentest report. Teach students to make deliberate choices: move stealthily where detection gaps are the finding, move noisily where demonstrating detection capability is the finding.
Persistence mechanisms need to be scoped to the test, not to real attacker objectives. An AI-planned persistence beacon should be limited in capability: limited egress, no data exfiltration, easy cleanup. Use AI to reason about the simplest persistence mechanism that satisfies the test objective.
Hybrid environments require AI to hold multiple network contexts simultaneously. On-prem movement, cloud pivots, and identity transitions all happen in sequence. LLMs can maintain this context across a planning conversation better than most mental models — leverage that explicitly.
Always plan the cleanup before the implant. AI can generate a cleanup checklist from a persistence plan — run this before deployment, not after, to ensure nothing survives engagement closeout.

Discussion Prompts

You have compromised a mid-tier workstation and want to reach a file server in a different VLAN. The client runs CrowdStrike Falcon. How would you prompt an AI to generate a movement plan that prioritizes detection gap demonstration over stealth?
A colleague uses AI to generate a persistence implant with keylogging capability because "it proves deeper access." What's wrong with this approach and how should persistence scope be constrained?
How does knowing the client's SIEM rule inventory change the value of AI-assisted movement planning? What if you do not have that information?
After a two-week engagement, you need to verify complete cleanup. How would you use AI to audit your own movement artifacts and confirm nothing was left behind?

Instructor Notes

This session often surfaces ethical tension that is worth airing deliberately: using AI to plan maximally stealthy movement feels natural to practitioners but can undermine the test's value and the client relationship. Frame it explicitly — the goal is defender improvement, not attacker glory. The cleanup checklist exercise works well as a quick table activity: give groups a hypothetical persistence plan and ask them to enumerate every artifact AI should track for cleanup. This usually reveals artifacts students would have forgotten manually, which makes the AI value case concretely.

Timing Guide

Introduction — 8 minRecap Session 4, frame the movement planning problem, session goals

Core Content — 48 minDetection-aware planning, ATT&CK reasoning, persistence scoping, cleanup planning

Discussion — 24 minCleanup checklist group exercise + discussion prompts

Wrap-up — 10 minKey takeaways, movement plan template, preview Session 6

Session 6 of 8

Cloud and Hybrid Pivots

When the network test touches AWS / Azure / GCP — IAM, identity bridges, and trust boundaries

⏱ 90 minutes

Learning Objectives

Identify the trust boundaries and identity bridges that create lateral movement paths between on-prem networks and cloud environments
Use AI to reason over cloud IAM policy documents and role assignments to identify over-privileged principals
Apply AI-assisted analysis to Azure AD Connect, AWS IAM Identity Center, and GCP Workload Identity Federation configurations
Determine when a network engagement legitimately reaches into cloud scope and how to handle that boundary in the ROE

Session Overview

Most "network" engagements now touch cloud infrastructure — whether through Azure AD Connect syncing on-prem identities to Entra ID, AWS EC2 instances accessible via VPN, or GCP workloads reachable from corporate networks. The identity and trust fabric that bridges these environments is where AI-assisted analysis provides some of its highest value, because the policy documents that define these trust relationships are dense, machine-readable text that LLMs handle well.

This session covers the three most common pivot scenarios: on-prem AD to Azure via Azure AD Connect, on-prem to AWS via IAM roles with EC2 instance profiles, and GCP workload identity federation. For each, students learn how to collect the relevant policy artifacts, structure AI analysis queries, and identify the specific misconfigurations that enable pivots — over-privileged sync accounts, excessive trust policies, and misconfigured federated identity providers.

Key Teaching Points

Azure AD Connect is one of the highest-value pivot targets in hybrid environments. The MSOL sync account typically has extensive on-prem AD privileges; compromise of the Azure AD Connect server often yields domain-level access. AI can parse the configuration export to identify what's being synced and what the sync account can do.
AWS IAM policy documents are JSON and LLMs read JSON natively. Feed a role's policy document to an LLM and ask for a privilege escalation path analysis — it can identify overly broad resource ARNs, missing condition keys, and PassRole chains more reliably than most practitioners can manually.
Cloud scope creep is a legal risk, not just a technical one. If network reconnaissance reveals a cloud pivot opportunity, the tester must confirm cloud scope before pursuing it. AI does not change this obligation — document the discovery and pause for scope confirmation.
GCP Workload Identity Federation is newer and less well-understood by practitioners. LLMs trained on current documentation can explain federation trust chains and identify misconfigurations faster than most practitioners can locate and parse the relevant GCP documentation.
Identity bridge misconfigurations are frequently out of patch scope. Unlike CVE-based findings, IAM misconfigurations require policy changes, not patches. AI-generated remediation language for policy documents is a direct deliverable accelerator.

Discussion Prompts

During an internal network test, you discover an Azure AD Connect server. Your scope says "internal network only." What do you do, and what information do you document while you wait for scope clarification?
You extract an AWS IAM policy JSON for a role attached to a compromised EC2 instance. How would you structure an AI prompt to identify the most exploitable privilege escalation path from that role?
A client argues that their cloud environment is "separate" from the network test scope. How do you explain the identity bridge attack surface in terms a non-technical stakeholder can act on?
How does AI-assisted cloud IAM analysis change the skill requirements for network pentesters? What do practitioners need to know that they might not currently?

Instructor Notes

Many network-specialist practitioners are intimidated by cloud IAM — the volume and density of policy documents is the barrier, not conceptual complexity. Demonstrating that an LLM can parse an AWS IAM policy and explain it in plain language is often a turning point moment for this audience. Prepare a realistic but sanitized IAM policy document (create one, don't use a real client's) and walk through the AI analysis live. For Azure AD Connect, a diagram of the trust relationship between on-prem and Entra ID is more valuable than a configuration dump — draw it on a whiteboard and explain which nodes an AI analysis targets.

Timing Guide

Introduction — 8 minRecap Session 5, frame hybrid identity surfaces, session goals

Core Content — 52 minAzure AD Connect, AWS IAM analysis, GCP federation, scope obligations

Discussion — 20 minIAM policy exercise + discussion prompts

Wrap-up — 10 minKey takeaways, cloud pivot checklist, preview Session 7

Session 7 of 8

Detection Engineering Feedback

Turning your test artifacts into detections the blue team can use after engagement closeout

⏱ 90 minutes

Learning Objectives

Convert pentest artifacts — commands run, network traffic patterns, host behaviors — into detection logic using AI-assisted translation
Write Sigma rules from pentest activity logs with AI assistance and validate them against real log samples
Explain how to structure detection feedback for blue teams who were not present during the engagement
Identify the artifacts that have the highest detection engineering value and prioritize their documentation during testing

Session Overview

Pentests generate an enormous amount of behavioral data that blue teams almost never see — the exact commands run, the precise network patterns of lateral movement, the sequence of authentication events that preceded a privilege escalation. AI can translate this operational data into detection logic at a speed that makes including it in the engagement deliverable realistic for the first time.

This session teaches practitioners to think of detection artifact generation as an ongoing activity during the test, not a post-engagement afterthought. Students learn which artifacts to deliberately preserve (command logs, network captures at key events, authentication sequences), how to structure AI prompts that produce usable Sigma and KQL detection drafts from those artifacts, and how to validate those drafts before handing them to a blue team.

Key Teaching Points

Detection artifacts should be collected during the test, not reconstructed from memory. Every command executed, every lateral move made, every privilege escalation attempted should produce a timestamped log entry that can later be translated to a detection. Build this habit before the test starts.
Sigma is the right output format — it's portable and human-readable. AI can draft Sigma rules from behavioral descriptions; practitioners review and tune; blue teams import into whatever SIEM they use. The AI drafts the rule; the practitioner validates it; the blue team owns it.
LLMs are good at translating attacker behavior into defender language. Describe what you did in operational terms and ask the LLM to describe what a defender would see — it produces the log patterns, event IDs, and behavioral indicators that Sigma rules are built from.
False positive analysis is part of detection validation. Before handing a draft Sigma rule to a blue team, prompt the LLM to enumerate common legitimate process behaviors that would trigger the same pattern. This prevents detection debt and builds trust with the blue team.
Detection feedback is a differentiator, not just a deliverable checkbox. Clients who receive working detection drafts alongside vulnerability findings have a concrete, immediate action to take. This is one of the highest-value things AI enables in network engagements.
Some detections are intentionally alerting on techniques the client already detects. Confirm those detections work is a separate deliverable from generating new ones — AI can help verify by predicting what the detection should log during your test activity.

Discussion Prompts

You ran a Pass-the-Hash lateral movement during the engagement. Walk through how you would prompt an AI to produce a Sigma rule that detects this behavior without triggering on normal NTLM authentication.
A blue team says they already have detections for the techniques you used. How do you use your engagement data to verify whether their detections would actually have fired?
How do you handle detection feedback for a technique the client has explicitly asked you not to disclose to the blue team (a red team scenario)? Does AI assistance change the equation?
What should a pentest engagement log look like to maximize detection engineering value? What would you add to your standard practice to produce better detection artifacts?

Instructor Notes

The Sigma rule drafting exercise is the highlight of this session and should not be rushed. Give students a realistic activity log (even a synthetic one) and walk through the AI-assisted Sigma drafting process together. If time allows, have them attempt a false positive analysis on their own draft rule — this is usually humbling in a productive way and reinforces the validation step. Emphasize that practitioners do not need to be detection engineers to do this — they need to understand the behavior they produced, describe it accurately, and recognize a plausible detection when they see one.

Timing Guide

Introduction — 8 minRecap Session 6, frame the detection feedback problem, session goals

Core Content — 45 minArtifact collection habits, Sigma drafting process, false positive analysis, KQL overview

Discussion — 27 minSigma drafting exercise + discussion prompts

Wrap-up — 10 minKey takeaways, detection feedback template, preview Session 8

Session 8 of 8

Network Pentest Reporting and Remediation Tracking

Reports that survive procurement scrutiny — methodology, evidence, repro, and AI-tool transparency

⏱ 90 minutes

Learning Objectives

Structure a network pentest report that accurately discloses AI tool use without undermining finding credibility
Write reproducible finding descriptions that include enough context for a developer or sysadmin to remediate without follow-up calls
Use AI to generate first-draft finding narratives and remediation guidance, then edit to practitioner standard
Design a remediation tracking artifact that integrates with client ticketing systems and tracks AI-assisted finding origins

Session Overview

The report is the only artifact most clients ever see from an engagement, and it must do three things simultaneously: prove the findings are real, explain how to fix them, and survive review by procurement, legal, and executive stakeholders who may not be technically sophisticated. AI can dramatically accelerate report drafting — but it also introduces disclosure obligations and quality risks that practitioners must manage deliberately.

This final session covers the full reporting workflow: evidence organization, finding narrative drafting with AI, remediation guidance generation, executive summary construction, and the AI transparency section that is increasingly expected by enterprise clients and government frameworks. Remediation tracking is covered as an engagement extension — how to build a tracker that survives the gap between report delivery and client remediation verification.

Key Teaching Points

AI transparency in reports is becoming a client expectation, not a voluntary disclosure. Many enterprise security programs and government frameworks now ask whether AI was used in testing. A clear methodology section covering AI tool use protects both the practitioner and the client.
AI-drafted findings must be edited by the practitioner who ran the test. AI can produce fluent finding narratives but it cannot know what the practitioner actually observed, what the exact reproduction steps were, or what compensating controls were present. The practitioner's edit pass is what makes the finding defensible.
Reproducible evidence means: what was run, what was returned, how it was verified. For every finding, the report should include the exact command or query, the exact output, and the verification step that confirmed the finding was real rather than an AI hypothesis.
Remediation guidance is where AI adds enormous value without safety risk. Generating technically accurate remediation steps for a confirmed finding is a low-risk AI task — the finding is already verified, the guidance just needs to be correct and actionable. AI is excellent at this.
Executive summaries written with AI tend to be generic. Train students to feed AI the specific findings list and business context, then demand a summary that references the actual findings rather than generic penetration testing language. Multiple revision passes are usually necessary.
Remediation tracking should be structured for client ticketing systems from day one. A tracker that maps each finding to a JIRA ticket template or ServiceNow record type is far more likely to drive actual remediation than a spreadsheet with no workflow integration.

Discussion Prompts

A client's legal team asks you to remove the AI methodology section from your report because they are concerned it will be used against them in a regulatory review. How do you respond?
You used AI to identify a critical finding, but the LLM's initial reasoning contained an error you caught during verification. The final finding is correct and verified. Do you disclose the AI error in the report? Why or why not?
Walk through how you would use AI to generate remediation guidance for a Kerberoasting finding. What context would you provide, what would you ask for, and what would you edit before including it in the report?
Six weeks after delivery, the client says they have remediated everything. How do you structure a remediation verification engagement, and how does AI assist in designing the verification test plan?

Instructor Notes

End the course on the transparency theme — it ties together the entire arc from scoping through reporting. Practitioners who are transparent about AI use, rigorous about verification, and clear in their documentation will be more trusted, not less, as AI becomes standard in the industry. The executive summary exercise (ask students to AI-draft one and then critique it against a quality bar) is a useful closer — it surfaces the editing skill gap honestly and gives students a concrete action to take back to their next engagement. Leave 10 minutes at the end for open Q&A and course retrospective.

Timing Guide

Introduction — 8 minRecap Session 7, frame the reporting problem, session goals

Core Content — 45 minAI transparency, finding narrative drafting, remediation guidance, executive summary

Discussion — 22 minExecutive summary exercise + discussion prompts

Wrap-up — 15 minCourse retrospective, key takeaways across all 8 sessions, open Q&A