In 2013, Mandiant published APT1: Exposing One of China's Cyber Espionage Units — a 76-page report that named a specific PLA unit (61398) and documented over 140 victim organizations across 20 industries. What made APT1 land with force was not just the intelligence it contained, but how that intelligence was structured: executive summary, actor profile, TTPs, infrastructure indicators, and appendices. Security leadership could read the first four pages and act. Analysts could mine the appendices for months. The report architecture did the persuasion work.
Recon findings that lack structure become liability. An analyst who dumps raw WHOIS data, paste-site screenshots, and unformatted Shodan results into a PDF has not written a report — they have archived their browser history. The intelligence value is real; the communication value is near zero.
A properly structured report answers three implicit questions every reader brings: How bad is it? (severity and scope), How do we know? (evidence and methodology), and What do we do? (recommendations). AI tools now make it possible to rapidly scaffold these answers — but the analyst must understand the scaffold before delegating its construction.
Professional recon reports share a common architecture regardless of whether they originate from a corporate pen-test team, a threat-intel firm, or a government CERT. The five layers are:
One page maximum. Severity rating, scope, business risk, and a single prioritized recommendation. Written for a CISO or VP who will read it in an elevator.
What was examined, what was excluded, which tools and data sources were used, time window of collection. Establishes credibility and reproducibility.
Numbered findings in descending severity order. Each finding states: observation, evidence, risk impact, and remediation recommendation.
Raw indicators, full screenshots, tool output logs, WHOIS records, certificate data. Consumed by analysts and engineers — not decision-makers.
Classification label, TLP designation, author, date, version, distribution list. Critical for legal and operational chain of custody.
Within Layer 3, each finding follows a repeating template that readers learn to scan efficiently. Mandiant, CrowdStrike, and Rapid7 all use variants of the same structure:
AI tools like Claude or GPT-4 can draft finding narratives from structured data — paste a raw Shodan result and request a finding block in the template above. The analyst's job shifts to verification and calibration: confirming the evidence chain, adjusting risk language for the client's sector, and ensuring the remediation is operationally feasible. Speed of drafting doubles; responsibility for accuracy remains entirely with the analyst.
Every recon report must carry a TLP designation established by FIRST (Forum of Incident Response and Security Teams). TLP controls sharing scope. Getting this wrong can cause legally significant data leakage or prevent defenders who need information from receiving it.
The 2021 HAFNIUM Exchange Server disclosures by Microsoft MSTIC were released TLP:CLEAR, allowing rapid widespread defender action. Contrast with the 2014 Sony Pictures breach, where FBI and US-CERT circulated indicator data under TLP:RED to prevent tipping off the attacker before remediation. The TLP choice was a deliberate tactical decision, not an administrative checkbox.
You have discovered that a target organization's legacy VPN appliance (Pulse Secure) is running a version vulnerable to CVE-2019-11510 (unauthenticated arbitrary file read). Shodan confirms the device is internet-facing on port 443. Your task is to work with the AI to produce a complete, properly structured finding block using the five-element template from Lesson 1.
When Verizon published the first Data Breach Investigations Report in 2008, it faced a structural problem: the underlying data was forensically granular, but the intended audience was primarily business leadership. The solution was a deliberate bifurcation — a narrative summary with infographics for executives, and a technical appendix with raw incident statistics for practitioners. By the 2015 edition, the DBIR had added a third register: sector-specific callouts for legal, HR, and compliance audiences. The DBIR's longevity owes as much to its audience architecture as to its underlying data quality.
Every significant recon report has at least four potential audience segments, each processing risk through a different lens. Effective reporting requires deliberately constructing content — sometimes the same content — in multiple registers.
Risk in business terms: revenue exposure, regulatory fines, reputational damage, competitive intelligence implications. Avoid technical jargon entirely. Use analogies. Quantify where possible.
Indicators, TTPs, attack paths, detection opportunities. Needs enough technical depth to build detections and run incident response runbooks. Wants MITRE ATT&CK mappings.
Specific systems, versions, configurations. Needs remediation steps they can actually execute — patch versions, configuration changes, firewall rules. Wants ticket-ready specificity.
Regulatory exposure, notification obligations, evidence preservation, chain of custody. Needs to understand what data was accessible and whether breach-notification thresholds were crossed.
The same finding — an exposed S3 bucket containing customer PII — reads fundamentally differently depending on audience:
Large language models excel at register translation. A prompt like "Rewrite this technical finding for a non-technical board audience, expressing risk in business terms and avoiding acronyms" consistently produces usable first drafts. The analyst must verify that business impact claims remain grounded in actual evidence — AI models sometimes escalate or de-escalate risk in translation without explicit justification.
Executives and boards increasingly demand quantified risk — not just "high severity" but dollar figures. The FAIR (Factor Analysis of Information Risk) model, developed by Jack Jones and now maintained by the FAIR Institute, provides a structured probabilistic approach to translating technical findings into financial loss estimates.
AI tools can help apply FAIR-aligned reasoning: given a finding's likely frequency of exploitation and probable loss magnitude (incident response costs, regulatory fines, breach notification), they can produce ranges rather than point estimates. Ranges are more honest and defensible than single figures derived from incomplete data.
In 2018, the Marriott/Starwood breach ultimately cost over $600 million in regulatory fines, litigation, and remediation. The initial recon-phase indicators — legacy Starwood systems running unpatched, exposed management interfaces — were discoverable via passive OSINT. A quantified risk treatment of those indicators at report time would have been materially different from a qualitative "Critical" label.
Resist the temptation to catastrophize findings to secure executive attention. Reports that overstate risk lose credibility when predicted impacts don't materialize. Verizon's DBIR consistently shows that organizations with mature security programs make better decisions from data-grounded, probability-weighted risk statements than from worst-case narratives. Credibility compounds over time.
For SecOps audiences, mapping recon findings to MITRE ATT&CK provides immediate operational context. Reconnaissance findings map primarily to the Reconnaissance (TA0043) tactic — techniques like Active Scanning (T1595), Search Open Websites/Domains (T1593), and Gather Victim Network Information (T1590) are directly observable via OSINT.
Including ATT&CK technique IDs in the technical register allows SOC analysts to immediately cross-reference existing detections, identify gaps, and update threat models without requiring additional translation from the report author. AI tools can suggest relevant ATT&CK mappings when given a finding description — treat these as starting hypotheses requiring analyst validation against the specific evidence.
You have confirmed an exposed Elasticsearch database at a target organization containing approximately 4 million customer records with names, email addresses, and plaintext passwords. The database has no authentication. Shodan indexed it 23 days ago. Your technical finding is written — now you need versions for the executive, legal, and engineering audiences.
In the 2016 FTC action against LabMD — a medical testing company — evidence of a data exposure discovered by the security firm Tiversa became central to a years-long legal dispute. The question was not whether the data was exposed, but how it was discovered, by whom, using what methods, and whether the chain of custody from initial discovery to regulatory submission was intact. The case reached the 11th Circuit Court of Appeals and turned substantially on evidence standards that a recon report had not anticipated needing to meet. The lesson: a finding that cannot prove its own provenance is a finding that cannot be used.
Chain of custody in traditional digital forensics means documenting every person who handled evidence, every tool that processed it, and every transfer that occurred. In OSINT reporting, the equivalent principle applies to the path from raw discovery to documented finding.
A properly evidenced recon finding must be able to answer: Who collected this data? When exactly (timestamp with timezone)? Using which tool and version? From which source (URL, IP, database)? Was the collection passive (no interaction with target systems) or active? Was the raw artifact preserved and hash-verified?
All evidence must carry UTC timestamps. Screenshot EXIF data, tool log timestamps, and Wayback Machine capture dates should be recorded and cross-referenced where possible.
Raw artifacts (downloaded files, captured pages) should be SHA-256 hashed immediately upon collection. The hash becomes the integrity anchor — any later modification is detectable.
Record tool name, version, and configuration used for each finding. Different versions of Shodan, Maltego, or theHarvester produce different results — version specificity is essential for reproducibility.
Each finding must be attributable to a named analyst. In legal proceedings, anonymous findings carry significantly less weight and may be excluded as hearsay without a sponsoring witness.
Screenshots are the most common — and most problematic — form of evidence in recon reports. An unstructured screenshot proves almost nothing in isolation. A properly captured and documented screenshot can anchor a legal proceeding.
AI tools cannot generate valid evidence. They can help draft the narrative describing evidence, but any AI-generated summary of a finding must be explicitly labeled as derived from analyst-collected artifacts — not as a primary source itself. Jurisdictions increasingly scrutinize AI-generated text in legal proceedings. Maintain a clear separation between AI-assisted narrative and analyst-verified artifacts.
The distinction between passive OSINT (observing publicly available information without interacting with target systems) and active recon (probing, scanning, or authenticating against target systems) carries significant legal weight in most jurisdictions.
Under the Computer Fraud and Abuse Act (CFAA) in the United States, accessing a computer system "without authorization" or "exceeding authorized access" can constitute a federal offense. The 2021 Supreme Court ruling in Van Buren v. United States narrowed the CFAA's scope, but the boundary between authorized OSINT and unauthorized access remains contested terrain.
Reports must explicitly state the collection methodology — passive or active — for each finding, and must document any written authorization scope for active collection. A finding derived from active scanning without documented authorization may expose the analyst and their organization to legal liability that eliminates the finding's value entirely.
In 2022, a major consulting firm's penetration test report was partially excluded in subsequent litigation because the engagement letter did not explicitly authorize subdomain enumeration, and the report could not document which findings came from passive OSINT vs. active scanning. The differentiation mattered for liability. Build the habit of tagging every finding with its collection method — it costs nothing and can save everything.
Every recon engagement should maintain a contemporaneous evidence log — a running record of artifacts collected, separate from the final report. This log becomes the foundation from which the report is drafted and the archive from which individual findings can be re-substantiated if challenged.
Timestamp (UTC) · Analyst · Finding ID · Source (tool + target) · Artifact filename · SHA-256 hash · Collection method (passive/active) · Notes
Evidence logs and artifacts must be stored in a write-once or append-only medium where feasible. Cloud storage with versioning enabled (AWS S3 Object Lock, Azure Immutable Blob) provides defensible immutability.
You have discovered a target organization's exposed Jenkins CI/CD server. The evidence trail involves: a Shodan result (timestamp: 2024-11-14 09:42 UTC), a Wayback Machine capture of the login page (archived 2024-11-10), a screenshot you captured today, and a GitHub repository containing hard-coded credentials that reference the server. Each artifact was collected passively. You need to construct a complete evidence log entry and assess whether any of the collection crossed into active territory.
In 2023, Recorded Future published a methodology note alongside several of their threat intelligence reports disclosing that AI language models had been used to assist with initial drafting, pattern summarization, and indicator clustering. The disclosure was deliberate: the firm wanted clients to understand both the efficiency gains and the editorial layer their analysts applied before publication. This transparency model — AI as accelerant, analyst as gatekeeper — has since become a de facto standard among major threat intelligence vendors, including Mandiant (now part of Google) and Secureworks. The question is no longer whether to use AI in reporting; it is how to build quality controls that make the use defensible.
The tasks where AI tools produce the most reliable value in recon reporting are those involving structured transformation — taking data in one form and producing it in another with well-defined criteria. These include:
AI language models fail systematically in ways that matter profoundly for security reporting. Understanding these failure modes is not optional — an analyst who cannot identify when AI output has gone wrong will eventually submit a report that damages credibility, triggers legal liability, or enables a bad decision.
Models confidently cite CVE numbers, CVSS scores, and NVD references that do not exist or apply to different software. Every CVE in an AI-drafted finding must be verified against nvd.nist.gov before publication.
When translating between audiences, AI may escalate a Medium finding to Critical or reduce a High to Low without flagging the change. Severity ratings must be locked before register translation and verified afterward.
AI training cutoffs mean models may cite patch availability, vendor support status, or regulatory requirements that have changed. All time-sensitive claims require independent verification.
AI generates plausible-sounding specific details (IP ranges, file sizes, record counts) that may not match actual evidence. Never use AI-generated specifics without verifying against the original artifact.
A defensible AI-integrated reporting workflow separates the AI's contribution from the analyst's verification at each stage. Recorded Future's and Mandiant's published approaches share a common pattern:
All collection is analyst-controlled. AI has no role. Evidence log is maintained contemporaneously. Hash and timestamp all artifacts.
Raw artifacts are organized into structured inputs — clean JSON, formatted WHOIS, annotated screenshots. The analyst verifies this input before passing to AI.
AI drafts finding blocks, register translations, and executive summary from structured inputs. Prompts specify template, audience, and constraints explicitly.
Every CVE verified. Every severity rating confirmed. Every specific detail cross-checked against original artifact. AI-generated text marked internally for audit trail.
A second analyst reviews the full report specifically looking for AI failure modes: hallucinated references, severity drift, false specificity, temporal blindness.
Report notes AI assistance in methodology section — which sections were AI-drafted and analyst-reviewed. This is increasingly required by client contracts and professional standards.
The quality of AI-generated finding blocks correlates directly with prompt specificity. A prompt that includes the template structure, audience, severity scale, and explicit instructions to flag uncertainty (e.g., "If you are not certain of a CVE number, write [CVE REQUIRES VERIFICATION] instead of guessing") produces dramatically more usable output than a vague "write a finding about this vulnerability." Invest time in building a reusable prompt library for your reporting workflow.
The professional community is converging on disclosure norms for AI use in security reports. CREST (Council of Registered Ethical Security Testers), PTES (Penetration Testing Execution Standard), and several national CERTs have begun incorporating AI-use disclosure requirements into their reporting standards.
The current emerging norm is: if AI was used to draft, summarize, or transform content that appears in a final report, the methodology section should state which tools were used, in what capacity, and what analyst review was applied. This is not a legal requirement in most jurisdictions as of 2025, but it is increasingly a contractual requirement in enterprise engagements and a professional ethics expectation.
The underlying rationale is identical to citation standards in academic research: the reader needs to understand the provenance of the claims in order to calibrate their confidence appropriately.
AI tools do not sign reports. Analysts do. Whatever AI contributes to a recon report, the analyst's name on the document represents their personal attestation that the content is accurate, evidence-supported, and appropriately calibrated. The efficiency gain from AI is real; the professional accountability transfer is zero. Sign nothing you have not verified.
You are finalizing a recon report for a financial services client. Three findings have been collected: (1) an exposed RDP port (3389) on an AWS-hosted server with a self-signed certificate and no geo-restriction; (2) an employee LinkedIn profile disclosing internal project codenames and vendor relationships; (3) a misconfigured DNS zone allowing zone transfer from a public resolver. You have clean structured data for each. Your task is to run the AI-assisted workflow: draft finding blocks, generate an executive summary, request ATT&CK mappings, and then deliberately try to make the AI hallucinate — to practice catching those failures.