Reconnaissance in the AI Era
Learning Objectives
- Contrast traditional OSINT workflows with AI-augmented reconnaissance in terms of speed, coverage, and practitioner skill requirements
- Identify the specific capabilities LLMs and agentic tools add to the recon phase and where they introduce new failure modes
- Articulate the ethical and legal boundaries governing AI-assisted recon in authorized engagements
- Describe the shift from manual analyst judgment to hybrid human-AI recon pipelines and its implications for engagement quality
Session Overview
Reconnaissance has always been the phase that separates comprehensive penetration tests from checkbox exercises. Before you can exploit anything, you need to understand the target — its infrastructure, its people, its technology choices, its suppliers, and its digital seams. Skilled recon analysts have always been rare and expensive. AI changes that equation dramatically: an LLM-augmented recon pipeline can cover more ground in an hour than a skilled analyst could in a week, and agentic tooling can run unsupervised across dozens of targets simultaneously.
But speed and coverage come with new failure modes. LLMs hallucinate. Automated pipelines generate false positives. Agentic tools can cross scope boundaries if improperly constrained. And the practitioner who delegates too much to AI may not catch errors that an experienced analyst would have spotted immediately. This opening session frames the entire course: AI is a force multiplier for skilled recon practitioners, not a replacement for judgment.
Key Teaching Points
- Traditional recon was bottlenecked on analyst time. Classic OSINT — Maltego graphs, hand-built dork queries, manual LinkedIn enumeration — requires skilled practitioners who are expensive, scarce, and fatigued. The result is that recon is consistently underfunded relative to its strategic value in engagements. AI changes the economics fundamentally.
- LLMs excel at aggregation, synthesis, and query generation. Give an LLM a target organization's name and it can generate hundreds of relevant search dorks, draft entity relationship maps from public filings, synthesize job posting signals into tech-stack inferences, and produce structured OSINT summaries faster than any analyst. The value is in breadth and synthesis, not in novel data access.
- Agentic tooling enables autonomous multi-step recon pipelines. Tools like custom agents built on LLM APIs can execute sequences: enumerate subdomains, check each for open ports, fingerprint services, check CVE databases for matches, and produce a prioritised list — all without practitioner intervention between steps. The practitioner's job shifts to configuring, scoping, and reviewing rather than executing.
- Hallucination is a critical failure mode in AI-assisted recon. LLMs confidently generate plausible-but-false information — fake employee names, nonexistent endpoints, fabricated CVEs, invented technology choices. Any AI-generated recon finding must be independently verified before it informs a test plan. Treating LLM output as ground truth is a reliability failure that can misdirect the entire engagement.
- Scope control is harder with agentic tools. A human analyst who accidentally navigates to an out-of-scope asset recognizes it and stops. An autonomous agent following a chain of DNS enumeration may traverse into subsidiary infrastructure, partner networks, or cloud environments not covered by the rules of engagement. Scope guardrails must be explicit and technically enforced, not assumed.
- Authorized-use frameworks have not caught up with AI-assisted recon capabilities. Most rules-of-engagement templates and bug bounty scopes were written before agentic recon existed. Practitioners should clarify with clients whether AI-automated querying, rate-limited API calls at scale, and LLM-generated social engineering intelligence gathering fall within scope — because the legal and reputational consequences of ambiguity fall on the tester.
Discussion Prompts
- You are scoping a penetration test for a mid-sized bank. The rules of engagement say "passive reconnaissance only." Does using an LLM to aggregate and synthesize publicly available data constitute "active" reconnaissance? How would you clarify this with the client?
- An agentic recon tool discovers a subdomain belonging to a third-party supplier that is outside the defined scope. The subdomain has a critical SQL injection vulnerability visible from unauthenticated access. What do you do?
- A junior team member argues that LLM-assisted recon is just "googling faster" and therefore requires no additional authorization considerations. How would you respond?
- What skills become more valuable for OSINT practitioners in a world where AI handles most of the mechanical data aggregation work?
Open with a live demonstration if possible: feed a target organization's name (use your own organization or a publicly known one that has consented) into a well-prompted LLM and show how much structured recon output emerges in under two minutes. The contrast with how long that would take manually is the session's most powerful moment. Make the ethical framing explicit early: everything in this course assumes authorized testing under a signed scope agreement. The hallucination failure mode deserves more time than instructors usually give it — practitioners who trust AI output without verification create real engagement quality problems.
Timing Guide
Passive OSINT with LLMs
Learning Objectives
- Design LLM-assisted OSINT workflows that maximize information yield while generating no direct network traffic to the target
- Apply advanced search operator generation techniques using LLMs to produce comprehensive dork query sets
- Synthesize organizational intelligence from passive data sources including job postings, press releases, LinkedIn, and public filings
- Identify the detection footprint of common OSINT tools and compare it to LLM-assisted passive approaches
Session Overview
Passive reconnaissance — gathering intelligence without generating any traffic that touches the target's infrastructure — is the gold standard for early-phase recon. It is also where LLMs provide their clearest advantage over traditional methods. An LLM can process, synthesize, and structure information from dozens of passive sources simultaneously: search engine results, cached pages, social media, public registries, court records, SEC filings, job boards, GitHub commits, and news archives. The result is a comprehensive organizational picture built entirely from data the target has already published.
This session walks through a structured passive OSINT methodology using LLMs at each stage. Emphasize to students that "passive" does not mean "trivial" — the intelligence value of well-executed passive OSINT often exceeds active scanning, and the risk of tipping off the target is effectively zero when done correctly.
Key Teaching Points
- LLMs are search dork generators of extraordinary depth. A well-prompted LLM can generate hundreds of targeted Google/Bing/Shodan dorks for a given organization in seconds, incorporating site operators, filetype restrictions, inurl patterns, and intitle combinations that would take an analyst hours to construct manually. The quality of the dork set directly determines the quality of passive discovery.
- Job postings are gold-standard tech stack intelligence. Organizations reveal their technology choices, version preferences, vendor relationships, security tool deployments, and cloud provider selections in job postings — often in more detail than their public architecture documentation. LLMs can harvest, aggregate, and synthesize job posting content across time to build a high-confidence technology picture.
- LinkedIn enumeration via LLM-assisted synthesis avoids direct scraping risks. Rather than scraping LinkedIn directly (which violates terms of service and may trigger detection), LLMs can synthesize employee intelligence from search engine cached results, public profiles, conference speaker bios, and GitHub contributions. The result is a partial but attribution-safe org chart.
- GitHub and code repository leakage is frequently underestimated. Developer commits to public repositories routinely expose internal hostnames, API endpoints, AWS account IDs, email addresses, credential fragments, and architectural comments. LLMs with code analysis capability can triage hundreds of repository search results and surface the highest-value leakage quickly.
- Passive certificate transparency logs reveal full subdomain topology. Certificate Transparency (CT) logs (crt.sh, Censys) provide a comprehensive historical record of TLS certificates issued for a domain — including subdomains, internal services that were briefly internet-exposed, and staging environments. This is entirely passive and produces no traffic to the target. LLMs can help cluster and analyze CT log output at scale.
Discussion Prompts
- You discover through passive OSINT that a target organization's CISO recently posted a LinkedIn comment mentioning they are "migrating off Splunk." How does this change your recon priorities, and how confident are you in this signal?
- A GitHub repository committed by a developer at your target contains what appears to be a hardcoded AWS access key. The repository is public and the commit is three years old. What are your next steps, and what do you document?
- How do you validate passive OSINT findings without crossing into active reconnaissance? Where is the line between validation and active probing?
- What is the most valuable piece of passive intelligence you have collected in a real engagement, and what made it valuable?
This session works best with a live demonstration using a consented organization (your training company, a client who has given explicit permission, or a purpose-built lab target). Walk through a dork generation prompt live and show the quality of output. The GitHub leakage discussion almost always produces war stories from experienced students — allow time for these because they are often the most educational content in the room. Remind students at the start that "passive" has a precise technical meaning (no direct target traffic) and that this distinction matters for authorization and scope documentation.
Timing Guide
Attack-Surface Mapping at Scale
Learning Objectives
- Build an AI-assisted attack surface enumeration pipeline that integrates passive and active discovery sources
- Apply LLM-assisted triage to prioritize discovered assets by attack potential
- Identify cloud asset exposure patterns including misconfigured S3 buckets, exposed cloud functions, and unsecured APIs
- Recognize shadow IT indicators and use them to extend the known attack surface beyond official asset inventory
Session Overview
Attack surface mapping is where passive intelligence synthesis meets active enumeration. The goal is a comprehensive inventory of every internet-accessible asset attributable to the target — subdomains, IP ranges, cloud storage, API endpoints, SaaS tenants, and shadow IT — prioritized by attack potential. For large enterprises with thousands of internet-facing assets, this is a problem of scale that AI tools address directly.
Modern organizations have attack surfaces that no single person or team fully understands. Mergers and acquisitions leave orphaned infrastructure. Developers spin up cloud resources outside IT governance. Third-party integrations expose internal services through partner APIs. AI-assisted attack surface mapping excels at exactly this kind of comprehensive enumeration — covering ground that human analysts would miss due to time constraints or organizational blind spots.
Key Teaching Points
- Subdomain enumeration is the foundation of attack surface mapping. Combining passive CT log data (crt.sh, Censys), DNS brute-forcing wordlists, and permutation generation with LLMs produces dramatically more complete subdomain lists than any single technique. LLMs can generate organization-specific permutations — incorporating known product names, office locations, and technology vocabulary — that generic wordlists miss.
- Cloud asset enumeration requires provider-specific knowledge that LLMs encode well. Each cloud provider (AWS, Azure, GCP) has predictable URL patterns for storage buckets, API gateways, container registries, and function endpoints. An LLM can generate exhaustive lists of plausible cloud asset URLs given an organization's name and known cloud provider — dramatically accelerating permutation-based cloud asset discovery.
- Shodan and Censys data becomes far more usable with LLM-assisted triage. A large Shodan search returns hundreds or thousands of results with raw banner data. An LLM can process this output and surface the highest-priority findings — unusual ports, known-vulnerable service versions, default credentials in banners, misconfigured admin panels — far faster than manual review.
- Shadow IT is discovered through signal analysis, not direct inventory. Employees use unsanctioned SaaS tools, personal cloud storage, and unofficial collaboration platforms that don't appear in any asset inventory. Job postings (employees listing tools in resumes), GitHub references, and social media professional content reveal shadow IT choices. LLMs can aggregate these signals across hundreds of data points into a probable shadow IT map.
- AI-assisted correlation connects isolated findings into attack chains. A subdomain, an open port, a leaked credential fragment, and a technology version signal are each low-value individually. LLMs are effective at connecting these dots — suggesting which combinations of findings create exploitable attack paths — transforming isolated data points into strategic intelligence.
- Continuous mapping is more valuable than point-in-time snapshots. Attack surfaces change daily — new services launch, old ones go offline, configurations change. AI-assisted pipelines that run continuously and alert on new surface additions or significant changes provide dramatically more value than a one-time mapping exercise that stales the moment the engagement scope is signed.
Discussion Prompts
- Your attack surface mapping reveals 47 subdomains for a target, but the signed scope agreement lists only 12 domains. What is your process for handling the discrepancy before proceeding?
- An AI triage of Shodan results surfaces an RDP port open on a server that appears to belong to a third-party vendor managing the target's HR system. How do you handle this finding?
- A target organization uses a combination of AWS, Azure, and an on-premises data center. How does your attack surface mapping strategy differ across these three environments?
This session pairs well with a lab exercise if your training environment supports it — even a simplified attack surface mapping exercise against a deliberately exposed lab target gives students hands-on experience with the pipeline. If running purely as a lecture, use a case study from a real bug bounty disclosure (HackerOne and Bugcrowd publish detailed write-ups that include reconnaissance methodology). The cloud asset enumeration section benefits from having specific URL pattern examples ready for AWS, Azure, and GCP — students are often surprised by how predictable cloud naming conventions are. Reinforce the scope verification step repeatedly; it is the most commonly skipped step in real engagements.
Timing Guide
Email and Identity Harvesting
Learning Objectives
- Build an LLM-assisted identity harvesting pipeline that aggregates staff intelligence from multiple passive sources
- Apply email format inference and validation techniques to produce high-confidence contact lists
- Use LLM-assisted pivoting to extend initial identity discoveries into organizational hierarchy mapping
- Integrate credential exposure data from breach databases into recon findings with appropriate handling protocols
Session Overview
Identity harvesting — building a list of valid staff email addresses, their organizational roles, and their credential exposure history — sits at the intersection of technical and social engineering reconnaissance. It is among the most practically valuable recon outputs because it directly enables phishing campaigns, password spraying attacks, business email compromise pretexting, and social engineering scenarios that human operators execute in later engagement phases.
LLMs accelerate every step of identity harvesting: inferring email format from small samples, generating candidate email lists at scale, pivoting from named employees to their colleagues and reports, and structuring disparate identity fragments from multiple sources into coherent profiles. The de-duplication and confidence-scoring challenges that made large-scale identity harvesting impractical for manual analysts are well-suited to AI-assisted processing.
Key Teaching Points
- Email format inference from small samples is highly reliable. Given three to five known valid email addresses from a target organization (discoverable through LinkedIn, conference speaker pages, or press releases), an LLM can identify the organization's email format (firstname.lastname@, f.lastname@, firstnamelastname@) with high confidence and generate candidate addresses for any named employee.
- LinkedIn provides structured organizational hierarchy for free. Public LinkedIn profiles reveal reporting relationships, team memberships, tenure, and internal project affiliations — all of which can be synthesized by an LLM into an org chart without any access to the target's internal systems. Focus harvesting effort on IT, security, finance, and executive personnel who are highest-value for social engineering.
- Hunter.io and similar tools are starting points, not endpoints. Commercial email discovery tools provide a useful initial list but miss employees who are not publicly prominent. LLM-assisted synthesis from conference talks, academic papers, community forums, and social media significantly extends coverage beyond what commercial tools return.
- Credential exposure data from HIBP and breach databases must be handled carefully. Have I Been Pwned (HIBP) and commercial threat intelligence platforms provide breach exposure data for email addresses that is highly relevant to password spraying and credential stuffing attack planning. This data must be handled under strict protocols: documented in the engagement record, never used for unauthorized access, and disclosed to the client as a finding rather than exploited offensively.
- De-duplication and confidence scoring are where LLMs add structural value. An identity harvesting pipeline across ten sources generates hundreds of data points about hundreds of individuals — many duplicated, inconsistent, or outdated. LLMs excel at fuzzy matching, resolving naming inconsistencies (Dave vs. David), identifying role changes, and assigning confidence scores to harvested data based on source quality and corroboration.
Discussion Prompts
- Your identity harvesting reveals that the target's CFO's credentials appeared in a 2023 breach of a password manager service. This is within scope of the engagement. How do you document this finding, and what do you recommend to the client?
- LinkedIn has blocked automated scraping, but you can manually view profiles. At what point does systematic manual LinkedIn enumeration become ethically distinct from automated scraping?
- You have harvested 200 valid email addresses for the target. The engagement scope authorizes phishing simulation. How do you decide which employees to target first, and how do you use recon data to personalize the phishing pretext?
- A target employee's public GitHub profile reveals they use the same username across GitHub, Twitter, and several technical forums. What additional intelligence can you derive from this cross-platform identity?
This session frequently raises questions about the ethics of harvesting individuals' personal information — even in an authorized engagement. Take these seriously: explain that identity harvesting is explicitly authorized by a signed scope agreement, that all harvested data is handled under the engagement's data classification requirements, and that individual employees are not the adversarial target (the organization's security posture is). Credential exposure data handling is an area where many practitioners are sloppy — be explicit about the handling protocol. Students who have done red team engagements will have useful stories about identity pivoting; encourage them to share in general terms without exposing client specifics.
Timing Guide
Tech-Stack Fingerprinting with AI
Learning Objectives
- Apply LLM-assisted tech-stack inference from indirect signals including job postings, error messages, and HTTP headers
- Use embedding-based similarity search to match observed behavioral signatures against known technology fingerprints
- Map inferred technology choices to known CVE exposure and attack surface implications
- Combine active fingerprinting data with passive inference to build a high-confidence technology inventory
Session Overview
Knowing what technology a target runs — and specifically which versions — is the bridge between reconnaissance and vulnerability assessment. Traditional tech-stack fingerprinting relied on HTTP response headers, error page signatures, and timing analysis. AI expands the fingerprinting surface dramatically: indirect signals like job postings, GitHub dependency files, conference talk slide decks, and developer community posts all encode technology choices that can be synthesized into a high-confidence stack picture without making a single request to the target.
This session covers both the passive inference layer (LLM-assisted synthesis of indirect signals) and the active fingerprinting layer (using LLMs to analyze HTTP responses, JavaScript bundles, and API behavior at scale). The combination produces a technology inventory accurate enough to drive CVE-targeted attack planning — which is the operational goal of this phase.
Key Teaching Points
- Job postings encode version-level technology specificity. A job posting requiring "five years of experience with Kubernetes 1.28+" or "familiarity with the CrowdStrike Falcon API" reveals specific technology choices and version ranges that an attacker can map directly to CVE databases. LLMs can parse hundreds of historical job postings to build a timeline of technology adoption and likely current version ranges.
- HTTP headers are a rich fingerprinting surface that LLMs can analyze at scale. Server, X-Powered-By, Via, Set-Cookie attribute names, Content-Security-Policy values, and CORS configuration patterns collectively fingerprint web frameworks, CDN providers, WAF products, and load balancer configurations. An LLM can analyze header sets from hundreds of subdomains and produce a consolidated technology inventory in seconds.
- JavaScript bundle analysis reveals frontend framework and version. Modern web applications ship JavaScript bundles that include framework version strings, dependency package names, and internal routing patterns. LLMs with code analysis capability can process minified JavaScript to extract these signals — identifying React versions, state management libraries, API client implementations, and authentication flows.
- Embedding-based similarity search enables behavioral fingerprinting at scale. By converting observed API response structures, error message formats, and authentication challenge patterns into vector embeddings and comparing them against a library of known technology signatures, practitioners can identify technology matches even when version strings are suppressed — a more robust approach than string matching against known signatures.
- CVE mapping should be automated against the inferred stack. Once a technology inventory is assembled, an LLM can automatically cross-reference it against NVD and vendor advisories to produce a prioritised list of potentially applicable CVEs — filtering by version range, component, and exploitability score to surface the highest-value targets for subsequent vulnerability assessment phases.
Discussion Prompts
- Your passive tech-stack analysis suggests the target is running a version of Apache Struts with a known remote code execution CVE. Your active fingerprinting cannot confirm the version. How do you present this finding, and what do you recommend as the next step?
- A target's website returns generic "404 Not Found" pages with no framework signatures in headers. What alternative fingerprinting approaches would you use, and how confident would you be in the results?
- Job posting analysis reveals a target hired a "Splunk SIEM Engineer" 18 months ago and recently posted a "Microsoft Sentinel Engineer" role. What does this technology transition signal about their current monitoring posture?
- How should tech-stack intelligence from recon be handed off to the exploitation phase of an engagement? What format and level of confidence qualification do you include?
The JavaScript bundle analysis point often surprises students who associate fingerprinting primarily with server-side signals. If your training environment allows, run Wappalyzer or a custom LLM-assisted header analysis against a consented target and walk through the output live. The embedding-based similarity search section is the most technically advanced topic in this session — calibrate depth to audience sophistication. For non-specialist audiences, the key message is: "AI can recognize technology by its behavioral patterns even when explicit version strings are hidden." The CVE mapping step should be presented as hypothesis generation, not confirmed findings — emphasize the need for validation.
Timing Guide
From Recon to Target List
Learning Objectives
- Apply a structured triage framework to convert raw recon data into a prioritised target list
- Use LLMs to score and rank findings by attack potential, ease of exploitation, and business impact
- Design an engagement plan that allocates testing time according to recon-derived risk intelligence
- Communicate recon-to-plan rationale to clients in terms they can evaluate and act on
Session Overview
Recon generates data. Successful engagements generate findings that matter. The transition from raw intelligence to a scoped, prioritised test plan is one of the most judgment-intensive steps in a penetration test — and one where AI-assisted triage can add substantial value by processing large volumes of recon data quickly and surfacing the highest-leverage targets for human analyst review.
This session covers the triage workflow: ingesting recon output from all prior phases, applying scoring criteria to assess attack potential, validating the top findings manually, and constructing an engagement plan that allocates limited testing time to maximum effect. The session also covers how to communicate this plan to clients — a step that is frequently underinvested but critical for client confidence and post-engagement relationship quality.
Key Teaching Points
- Scoring criteria must be defined before triage begins. An LLM triage prompt needs explicit scoring criteria to produce consistent, useful results: exploitability (how straightforward is the attack path?), blast radius (how many systems or users are affected if compromised?), detection difficulty (how likely is exploitation to be caught?), and business impact (what is the downstream consequence of a successful compromise?). Defining these criteria before generating output prevents anchoring bias.
- Asset clusters, not individual findings, should drive prioritization. Individual findings — an open port here, a leaked credential there — rarely tell the full story. The highest-value targets in a complex engagement are usually clusters of correlated findings that, taken together, create a compelling attack path. LLMs excel at this correlation step: given a corpus of recon findings, they can identify which combinations create the most credible attack chains.
- Time allocation should reflect risk-adjusted return on testing effort. A penetration test has finite time. An LLM-assisted engagement plan should explicitly allocate testing hours by priority tier — high-confidence, high-impact findings get the most time, speculative findings get validated only if time allows. This allocation should be documented and shared with the client so they understand what is and isn't covered.
- Recon gaps are as important as recon findings. An honest engagement plan identifies what recon was unable to determine — segments that couldn't be enumerated, assets that couldn't be fingerprinted, employees whose roles remain unknown. These gaps inform the scope limitations section of the final report and prevent clients from drawing false confidence from "no critical findings" when significant blind spots exist.
- Client-facing engagement plans require a different register than technical triage output. The prioritised target list produced by LLM triage is a technical working document. What clients need is a narrative engagement plan that explains in business terms why specific systems and attack paths are prioritized, what the testing approach will be, and what outcomes they should expect. LLMs can assist with this translation step as well.
Discussion Prompts
- Your triage produces 140 findings from five days of recon. You have 15 days of testing time. Walk through how you would allocate that time, and what criteria you use to decide what gets dropped.
- A client's primary concern is ransomware risk. Your recon has identified both a high-confidence VPN vulnerability and a lower-confidence supply chain exposure. How does the client's stated concern affect your prioritization?
- You present your engagement plan to the client's IT director and they disagree with your prioritization — they want you to focus on a system you rated as low priority. How do you handle this?
This is a decision-making session as much as a technical session — the skills being developed are analytical judgment and communication, not just tool proficiency. Use a case study with a realistic volume of recon findings and work through the triage process as a group, letting students debate prioritization choices before revealing what the actual engagement focused on. The client communication section is frequently underdeveloped in security training — spend real time on it. Students who can translate technical triage output into business-relevant engagement plans are substantially more valuable than those who cannot.
Timing Guide
Operational Security and Detection Risk
Learning Objectives
- Identify the detection footprint of AI-assisted recon activities from the perspective of a target's SOC
- Apply rate limiting, timing jitter, and infrastructure rotation strategies to minimize detection probability
- Evaluate attribution risk for AI-assisted recon activities and implement appropriate operational security controls
- Distinguish between recon activities that are inherently undetectable and those that require active noise management
Session Overview
AI-assisted recon is fast — which is exactly the problem from an operational security perspective. Traditional recon spread over days by human analysts generates a noise floor indistinguishable from normal background traffic. AI-assisted pipelines that enumerate thousands of assets in hours generate query patterns that are highly anomalous and increasingly detectable by modern threat intelligence systems, WAFs, and bot detection services. Speed is a double-edged capability multiplier.
This session teaches students to think like the blue team: what does their recon activity look like from the SOC's perspective, what telemetry sources expose them, and what operational security controls reduce detection probability without sacrificing coverage. Crucially, the goal is not to be invisible but to be unattributed and unactionable — a blue team that detects anomalous traffic but cannot attribute it or produce a useful incident report has been effectively defeated.
Key Teaching Points
- Rate and timing are the primary detection signals for active recon. A human browsing a website might visit five pages in a minute. An automated scanner might request 5,000 pages in a minute. Even without signature-based detection, this rate anomaly is visible in access logs and triggers WAF rate limiting, CDN bot detection, and SIEM alerting. Rate limiting with random jitter (distributing requests according to a realistic human timing distribution) substantially reduces this signal.
- Source IP attribution is the primary attributability risk. Every active request carries a source IP address that the target's infrastructure logs. Using a single egress IP for an entire recon campaign creates a single point of attribution. Rotating across residential proxy networks, cloud egress IPs from multiple providers, and Tor exit nodes (for purely passive activities) distributes the attribution surface — at the cost of added complexity and some performance reduction.
- User-agent and TLS fingerprint consistency matter for sophisticated targets. Advanced WAF and bot detection systems correlate HTTP request patterns (user-agent strings, accept headers, TLS cipher suite order) to identify automated traffic even when rate and IP distribution look legitimate. Tools like curl have distinctive TLS fingerprints. Shaping your recon tooling to match browser TLS fingerprints reduces this signal.
- Passive recon generates no direct detection signal. CT log queries, HIBP lookups, Shodan searches, and LinkedIn synthesis over cached search results generate no traffic to the target's infrastructure and are therefore entirely invisible to target-side monitoring. Maximizing the passive phase before transitioning to active enumeration is the single most effective OPSEC decision available.
- Distinguishing recon from exploitation in detection terms matters for scoped engagements. Some clients have SOC teams running as part of a purple team exercise. In this context, recon detection is an explicit test objective. Know your engagement type and whether intentional detection is part of the scope — because OPSEC controls that make sense in a red team exercise may be counterproductive in a purple team scenario where detection capability is being validated.
Discussion Prompts
- A target's SOC detects your subdomain enumeration activity on day two of a 15-day engagement and contacts your client to report a potential intrusion. Walk through how you respond and what this means for the remainder of the engagement.
- You are running a red team engagement where the blue team is not aware. A blue team member notices an anomalous pattern in DNS query logs that corresponds to your recon. Do you back off, change technique, or continue? What factors drive that decision?
- Your client asks you to run recon as "stealthily as possible." How do you define "stealthy" in contractual terms, and what level of detection is acceptable under a "stealth" constraint?
- AI-assisted recon tools will become more common and their signatures will become known to security vendors. How do you expect the detection arms race to evolve over the next three years?
This session rewards blue team perspective-taking. If you have access to blue team practitioners — as co-instructors, guest speakers, or within the student group — bring them into this session explicitly. The most useful exercise is asking students to describe their planned recon approach and then having the group analyze what that looks like in a SIEM — what log sources fire, what correlations trigger, what an analyst would see. Students with red team experience sometimes underestimate how visible their "quiet" recon is; concrete SIEM examples are a reality check. Reinforce that OPSEC is not about perfection — it's about raising the cost of attribution above the blue team's willingness to investigate.
Timing Guide
Reporting Recon Findings
Learning Objectives
- Structure a recon findings report that distinguishes methodology, evidence, confidence levels, and recommendations
- Apply LLM-assisted drafting to accelerate report production without sacrificing technical accuracy
- Communicate AI-augmented recon methodology to clients in terms that address likely skepticism about AI-generated findings
- Design recon documentation that enables blue teams to validate findings and implement targeted remediation
Session Overview
The quality of a penetration test is ultimately judged by the quality of the report — not by the sophistication of the methodology or the number of findings. A comprehensive recon phase that produces a poorly documented report delivers negligible value to the client. Conversely, a methodical recon phase documented in a way that blue teams can act on — reproducing each finding, understanding the attack path it enables, and implementing targeted remediation — can fundamentally improve an organization's security posture.
AI-assisted recon introduces reporting challenges that don't exist in traditional engagements: clients may be skeptical of AI-generated findings, LLM outputs require explicit confidence qualification, and the methodology documentation must explain AI-assisted workflows to clients who may have limited familiarity. This final session addresses these challenges directly, sending practitioners out with a practical reporting framework for AI-augmented engagements.
Key Teaching Points
- Every finding must include a reproducibility path. A blue team reviewer should be able to follow the steps documented in the finding and arrive at the same conclusion independently. For AI-assisted findings, this means documenting the specific queries, tools, and data sources used — not just "LLM analysis revealed..." Reproducibility is the evidence standard for security findings.
- Confidence levels must be explicitly stated for AI-assisted findings. Findings derived from LLM synthesis of indirect signals carry different confidence levels than findings from direct observation. Document this explicitly: "High confidence (confirmed via three independent sources)" vs. "Medium confidence (inferred from job posting analysis, not directly observed)." Clients must be able to distinguish these when prioritizing remediation.
- LLMs can accelerate report drafting without replacing analyst judgment. LLMs are effective at generating first drafts of finding descriptions, remediation recommendations, and executive summaries from structured notes. The practitioner's role is to review, correct, and inject the contextual judgment that LLMs cannot provide — not to accept LLM draft output uncritically. Clients who discover that AI-generated text contains inaccuracies will not accept that the finding itself is accurate.
- Attack path narratives are more actionable than finding lists. A report that lists 47 individual findings is less actionable than one that organizes findings into five attack paths, each documented as a narrative chain from initial access to impact. Attack path narratives help blue teams understand which findings must be remediated together to close a path — not just which individual issues exist.
- The methodology section must explain the AI-augmented workflow without overstating its role. Some clients will be excited about AI-assisted recon; others will be skeptical or concerned about accuracy. Write a methodology section that accurately describes what tools and LLMs were used, what human review and validation steps were applied, and what the confidence implications are. Transparency here builds trust; opacity creates post-report challenges.
- Blue team remediation guidance should be specific enough to implement. "Reduce your attack surface" is not actionable. "Remove the S3 bucket at [specific URL], which is publicly readable and contains configuration files referencing internal hostnames" is actionable. Every recon finding should come with a specific, implementable remediation step that the blue team can validate after implementation.
Discussion Prompts
- A client's security team pushes back on an attack-path finding, saying the individual components are "low severity" and questions whether the LLM-assisted correlation is accurate. How do you respond, and how did your documentation approach create or prevent this situation?
- You used LLM-assisted drafting to write 80% of a 60-page recon report. A colleague says this undermines the report's credibility. Do you agree, and how would you disclose AI assistance in the report itself?
- A blue team lead asks if they can have access to your "raw recon data" — the full corpus of findings before triage and prioritization. What do you give them, and what do you withhold and why?
- Reflecting on the full course: what is the single most important thing that distinguishes excellent AI-assisted recon from mediocre AI-assisted recon?
Close the course by circling back to session one's central thesis: AI is a force multiplier for skilled practitioners, not a replacement for judgment. The reporting session makes this concrete — the AI-assisted recon that is hardest to defend in a report is the AI-assisted recon where human judgment was least present. Use the final discussion question as a capstone: go around the room. Students who answer "validation" or "human review" have internalized the course's core message. Students who answer "better prompting" or "more powerful tools" may need the message reframed. For experienced practitioners, the most valuable part of this session is sharing report structures they have actually used; invite this if time allows.