L1
Β·
Quiz
Β·
Lab
L2
Β·
Quiz
Β·
Lab
L3
Β·
Quiz
Β·
Lab
L4
Β·
Quiz
Β·
Lab
Module Test
Module 2 Β· Lesson 1

What Passive OSINT Actually Means

No packets sent. No logs written. How LLMs synthesize intelligence from publicly available data without touching the target.
If you never interact with a target's systems, what can an LLM still tell you β€” and where does that intelligence actually come from?

In 2013, researchers at the University of Cambridge published a study demonstrating that Facebook "likes" alone could predict a user's IQ, sexuality, political affiliation, and personality type with startling accuracy β€” all without any direct contact with the individuals studied. No surveys. No interviews. No interaction. The data had already been deposited publicly. The intelligence was simply waiting to be read.

This is the foundational logic of passive OSINT: the target has already left the evidence. The investigator's only job is synthesis.

Defining Passive vs. Active OSINT

The intelligence community distinguishes passive collection from active collection by a single criterion: does the collection method generate a signal detectable by the target? Active reconnaissance β€” port scans, login attempts, direct contact β€” creates log entries, raises intrusion alerts, and may constitute unauthorized access under statutes like the Computer Fraud and Abuse Act (CFAA) or the UK Computer Misuse Act 1990.

Passive OSINT operates exclusively on data that has already been made public, cached, indexed, or otherwise placed into open repositories. The collector generates no new network traffic to the target's infrastructure and creates no artifacts on the target's systems.

LLMs extend passive OSINT in two ways. First, they act as synthesis engines β€” taking fragmented public data points and constructing coherent profiles far faster than a human analyst. Second, they act as query engines β€” helping analysts identify what types of public data exist and how to locate them without wasting time on active enumeration.

Passive Methods
  • Search engine dorking (cached results)
  • WHOIS & DNS record lookups
  • Certificate Transparency log analysis
  • Job posting analysis
  • Social media profile scraping
  • Pastebin / code repository trawling
  • Wayback Machine historical snapshots
  • Public breach databases (HaveIBeenPwned)
Active Methods (Out of Scope Here)
  • Port scanning (nmap, masscan)
  • Banner grabbing
  • Directory brute-forcing
  • Credential stuffing attempts
  • Phishing / social engineering
  • Direct API calls to target systems
  • Subdomain brute-force enumeration

The LLM as an Intelligence Multiplier

Before LLMs, passive OSINT was labor-intensive. An analyst gathering intelligence on a corporation might spend hours collating LinkedIn profiles, parsing WHOIS records, reading annual reports, and cross-referencing job postings β€” before producing a single structured assessment. The data existed; the bottleneck was human synthesis speed.

LLMs collapse that bottleneck. Given a set of raw passive data β€” a company's LinkedIn employee list, a set of job postings, a domain's DNS records, a GitHub repository's commit history β€” an LLM can synthesize a structured threat profile in seconds. It can infer technology stacks from job descriptions, map organizational hierarchies from LinkedIn data, and identify likely attack surfaces without a single packet being sent.

Researchers at IBM X-Force documented this pattern in 2023, noting that generative AI was being used by threat actors to accelerate the pre-exploitation "reconnaissance phase" β€” specifically the synthesis of public data into actionable intelligence packages.

Legal Boundary

Accessing data that is technically public but behind authentication barriers β€” even weak ones β€” may not qualify as "passive" under law. In hiQ Labs v. LinkedIn (9th Circuit, 2022), the court debated whether scraping publicly visible LinkedIn profiles constituted unauthorized access. Always verify the legal framework governing your jurisdiction before collection.

Key Passive Data Categories

Passive OSINT draws from six primary data categories. Understanding these categories is essential for structuring effective LLM-assisted collection workflows:

Domain IntelligenceWHOIS records, DNS A/MX/TXT records, certificate transparency logs, ASN data. Reveals hosting infrastructure, mail systems, and registered entity details.
Personnel IntelligenceLinkedIn profiles, company directories, conference speaker lists, GitHub contributor histories. Maps the human attack surface.
Technology IntelligenceJob postings, GitHub repos, BuiltWith/Wappalyzer data, error messages in cached pages. Reveals the technology stack without touching live systems.
Credential IntelligencePublic breach databases, paste sites, dark web leak indexes (via aggregators). Identifies previously exposed credentials.
Geospatial IntelligenceGoogle Maps, satellite imagery, EXIF metadata in public photos. Reveals physical locations and facility layouts.
Financial / Regulatory IntelligenceSEC filings, Companies House records, court documents, patent filings. Reveals vendors, M&A activity, and third-party relationships.
Module 2 Core Principle

An LLM does not collect passive OSINT β€” it synthesizes it. The analyst's job is to understand which data categories to collect and feed to the model. The model's job is to identify patterns, connections, and inferences the analyst might miss. Human judgment governs collection scope and legal compliance; the LLM governs synthesis speed and breadth.

Module 2 Β· Lesson 1 Quiz

Passive OSINT Fundamentals

4 questions Β· Select the best answer for each
1. What is the defining characteristic that makes OSINT collection "passive" from a legal and operational standpoint?
Correct. Passive collection is defined operationally and legally by the absence of detectable interaction with the target's infrastructure β€” no packets sent, no logs written on target systems.
Not quite. The defining criterion is whether your collection method creates a signal the target could detect β€” not the intended use or your network anonymization.
2. According to IBM X-Force research cited in this lesson, how were generative AI models being used by threat actors in the reconnaissance phase by 2023?
Correct. IBM X-Force noted that LLMs were being leveraged specifically to collapse the time cost of synthesizing fragmented public data into structured intelligence β€” the core passive OSINT workflow.
That's not what was documented. The IBM X-Force finding was specifically about using AI to synthesize public data faster β€” the synthesis bottleneck, not active enumeration or phishing generation.
3. An analyst wants to infer a company's technology stack without visiting the company's live website. Which passive data source is most likely to reveal this?
Correct. Job postings are a classic passive technology intelligence source β€” they routinely reveal specific frameworks, cloud platforms, security tools, and programming languages in use without any interaction with live systems.
While WHOIS, SEC filings, and DNS lookups provide useful data, they are less likely than job postings to reveal the detailed internal technology stack. Job postings explicitly list required technical skills and tools.
4. The 2013 Cambridge University study on Facebook "likes" is cited in this lesson to illustrate which core concept?
Correct. The study illustrates the foundational logic of passive OSINT: the target has already deposited the evidence publicly. The analyst's job is synthesis, not interaction.
The lesson cites this study specifically to make the point that intelligence can be derived from already-public data deposits β€” no direct subject contact required. The subject's privacy or legal access distinctions are not the primary point being made.
Module 2 Β· Lab 1

Passive Scope Mapping with an LLM

Practice structuring passive OSINT collection plans with AI assistance

Lab Objective

You are conducting a passive reconnaissance engagement for a red team assessment against a hypothetical mid-sized financial services firm. Your objective is to use the AI assistant to help you structure a passive OSINT collection plan β€” identifying which data categories to target, which sources to use, and what intelligence gaps might remain.

The AI will not collect data for you β€” it will help you think through collection methodology, legal boundaries, and synthesis priorities.

Suggested opening: "I'm planning a passive OSINT engagement for a mid-sized financial services company. Help me build a structured collection plan covering the six passive data categories from the lesson β€” domain, personnel, technology, credential, geospatial, and financial intelligence."
OSINT Planning Assistant
Passive Collection Focus
Ready. I can help you build a structured passive OSINT collection plan, evaluate data source priorities, and flag legal considerations. What's your target scope β€” are we mapping a single domain, a corporate entity, or a specific individual?
Module 2 Β· Lesson 2

Domain Intelligence Without Active Enumeration

Certificate Transparency logs, DNS records, WHOIS, and ASN data β€” the infrastructure map that targets build for you.
How much of an organization's network infrastructure can be mapped using only data the organization has already published β€” and how do LLMs help analysts interpret that map?

In the aftermath of the SolarWinds SUNBURST breach, post-incident analysts reconstructed much of the attacker's initial reconnaissance from public data alone. Certificate Transparency logs showed that avsvmcloud.com β€” the attacker-controlled C2 domain β€” had been registered and certificated weeks before the supply chain compromise was activated. The domain's registration patterns, ASN assignments, and DNS configurations were all visible in public logs throughout the operation. An analyst monitoring Certificate Transparency feeds for SolarWinds-adjacent infrastructure could have flagged the anomaly before the breach succeeded.

The intelligence was passive, public, and free. The bottleneck was not collection β€” it was synthesis at scale.

Certificate Transparency Logs

Since 2013, the CA/Browser Forum has required all publicly trusted Certificate Authorities to log every issued TLS certificate to publicly auditable Certificate Transparency (CT) logs. Tools like crt.sh, Censys, and Facebook's CT Monitor expose the complete issuance history for any domain.

For an OSINT analyst, CT logs are extraordinarily valuable because they reveal subdomains that organizations have never publicly advertised. A company might expose internal staging environments, development servers, VPN gateways, and partner portals through certificate issuance alone. Every certificate is timestamped, so the log also reveals when new infrastructure was provisioned β€” a timeline of infrastructure growth that is invisible to the organization's security team but public to any analyst who knows where to look.

An LLM can help analysts interpret bulk CT log exports β€” identifying naming conventions, clustering subdomains by likely function, and flagging subdomains that suggest sensitive internal systems based on naming patterns.

Real Tool: crt.sh

Search crt.sh/?q=%.example.com to retrieve all certificates issued for any subdomain of a target domain. The results are public, free, and require no authentication. The wildcard operator reveals subdomains that have never been linked from any public-facing page.

DNS Records as Passive Intelligence

DNS records are public by design. They must be β€” without them, email couldn't be delivered and websites couldn't be reached. But the information they contain extends well beyond simple resolution. Analysts use passive DNS lookups (via services like SecurityTrails, PassiveTotal, or VirusTotal) to access historical DNS data β€” revealing how a domain's infrastructure has changed over time without ever querying the live nameserver.

Record TypeIntelligence ValueExample Finding
A / AAAAHosting provider, CDN usage, IP geolocationTarget moved from on-prem to AWS in Q3 2023
MXEmail provider (Google Workspace, O365, Proofpoint)Proofpoint MX suggests email security gateway
TXT / SPFThird-party SaaS services explicitly authorized to send mailSPF includes Salesforce, HubSpot, Zendesk
DMARCEmail security posture (p=none = no enforcement)p=none indicates susceptibility to spoofing
NSDNS hosting provider, potential for zone transferCloudflare NS indicates CDN + DDoS protection
CNAMEThird-party service integrations (Zendesk, HubSpot subdomains)support.example.com β†’ zendesk.com

WHOIS and ASN Data

WHOIS records β€” though increasingly privacy-redacted under GDPR β€” still reveal registrar, registration date, name servers, and sometimes registrant organization details. For corporate targets, ASN (Autonomous System Number) lookups via ARIN, RIPE, or BGP.he.net reveal the IP ranges an organization owns or operates β€” the foundation for understanding the full scope of internet-facing infrastructure without any active scanning.

In 2019, researchers at DomainTools published analysis showing that WHOIS registration patterns β€” registrar selection, registration timing, privacy protection choices β€” could be used to cluster domains operated by the same threat actor with high confidence. The same clustering logic applies when building a passive infrastructure map of a target organization: consistent ASN ownership, registrar choices, and certificate issuance patterns create a fingerprint that LLMs can help identify and describe.

LLM Application: Domain Intelligence Synthesis

Feed an LLM a bulk export of CT log results, passive DNS history, and WHOIS records for a target domain. Prompt it to: (1) identify likely internal vs. public-facing subdomains by naming convention, (2) map the third-party service dependencies visible in DNS TXT/CNAME records, (3) assess the email security posture from DMARC/SPF configuration, and (4) flag infrastructure changes that suggest recent migrations or new projects. This synthesis takes an LLM seconds; a human analyst hours.

Module 2 Β· Lesson 2 Quiz

Domain Intelligence

4 questions Β· Select the best answer for each
1. What made Certificate Transparency logs significant in post-breach analysis of the SolarWinds SUNBURST attack?
Correct. The avsvmcloud.com C2 domain was certificated and visible in CT logs before the SUNBURST implant was activated β€” a passive, public signal that preceded the attack's operational phase.
The significance was that the attacker's C2 domain registration and certificate issuance appeared in public CT logs before the compromise activated β€” demonstrating that passive monitoring could have provided early warning.
2. An analyst reviewing a target's DNS TXT records finds that the SPF record includes "include:salesforce.com include:hubspot.com include:zendesk.com". What does this reveal passively?
Correct. SPF includes list every third-party service authorized to send mail on the domain's behalf β€” directly revealing SaaS tool dependencies. Salesforce (CRM), HubSpot (marketing), and Zendesk (support) together paint a clear picture of the organization's operational software stack.
SPF include directives reveal which third-party services are authorized to send email on behalf of the domain β€” which identifies SaaS tool integrations, not web hosting. Each service listed maps to a specific business function.
3. A target organization's DMARC record reads "v=DMARC1; p=none; rua=mailto:dmarc@example.com". What security implication does the "p=none" value reveal?
Correct. A DMARC policy of "none" means the organization is collecting reports but not enforcing rejection or quarantine of unauthenticated messages β€” leaving the domain vulnerable to email spoofing attacks.
DMARC "p=none" means no enforcement action is taken on failing messages. It is a monitoring-only configuration. The domain is susceptible to spoofing because receiving mail servers will not reject or quarantine unauthenticated messages purporting to be from this domain.
4. Why are Autonomous System Number (ASN) lookups valuable for passive infrastructure mapping, according to this lesson?
Correct. ASN lookups reveal publicly registered IP ranges owned by an organization β€” giving analysts the full scope of internet-facing infrastructure as a passive starting point, without a single scan.
ASN data from registries like ARIN and RIPE reveals the IP ranges registered to an organization. This scopes the entire internet-facing infrastructure passively β€” without scanning, probing, or touching any live system.
Module 2 Β· Lab 2

DNS & Certificate Intelligence Analysis

Use the AI assistant to interpret domain intelligence findings

Lab Objective

You've gathered the following passive domain intelligence on a fictional target, Meridian Financial Group (meridianfg.example.com). Use the AI assistant to help you interpret these findings and identify intelligence value and potential attack surface implications.

Simulated findings: CT logs show 47 subdomains including staging.meridianfg.example.com, vpn.meridianfg.example.com, and dev-api.meridianfg.example.com. SPF record includes Salesforce, Proofpoint, and Workday. DMARC policy is p=none. MX records point to Proofpoint.

Suggested opening: "I have passive domain intelligence on a target called Meridian Financial Group. Their CT logs show subdomains including staging, vpn, and dev-api environments. Their SPF includes Salesforce, Proofpoint, and Workday. DMARC is p=none. Help me interpret what this means for attack surface mapping."
Domain Intelligence Analyst
DNS & Cert Focus
Ready to help you interpret domain intelligence findings. Share what you've collected from CT logs, DNS records, WHOIS, or ASN data and I'll help you structure the analysis β€” identifying infrastructure patterns, third-party dependencies, and security posture indicators.
Module 2 Β· Lesson 3

Personnel & Technology Intelligence from Open Sources

LinkedIn, GitHub, job postings, and conference papers β€” how public human and technical data maps the real attack surface.
What does a company's hiring activity reveal about its security posture β€” and how can an LLM extract that intelligence from job descriptions faster than any human analyst?

The 2013 Target Corporation breach β€” which exposed 40 million credit card records β€” was enabled in part by intelligence that was publicly available before the attack. Target's job postings in 2012 and 2013 prominently listed experience with HVAC and building management system vendors as desirable qualifications for facilities contractors. Fazio Mechanical Services, the third-party HVAC contractor through which the attackers gained initial access, was publicly listed as a Target vendor on Fazio's own website and in Target's sustainability reports.

The attackers did not need to interact with Target's network to identify the entry vector. The vendor relationship was public. The network integration was implied by the job postings. The intelligence was passive β€” and lethal.

Job Postings as Technology Intelligence

Job postings are arguably the richest single passive OSINT source for technology intelligence. Organizations advertising for security engineers will specify the exact tools in their stack β€” SIEM platforms, EDR vendors, cloud environments, IAM solutions. A posting for a "Senior DevOps Engineer" listing "proficiency with HashiCorp Vault, AWS IAM, and Terraform" reveals secrets management architecture, cloud provider, and infrastructure-as-code tooling in a single sentence.

Brian Krebs documented this methodology in a 2014 analysis of the Target breach, noting that attackers could have mapped the vendor ecosystem entirely from public-facing documents before any network interaction occurred. Security researchers at RiskIQ formalized this into a methodology they called "Outside-In" attack surface mapping β€” using job postings as a primary data source.

LLMs are particularly effective at processing bulk job posting exports. An analyst can feed 50 job postings into an LLM and prompt it to extract: technology stack components, cloud providers, security tools, programming languages, compliance requirements (which reveal regulatory environment), and team structure hints from reporting relationships.

What Job Postings Reveal
  • SIEM platform (Splunk, QRadar, Sentinel)
  • EDR vendor (CrowdStrike, SentinelOne, Carbon Black)
  • Cloud environment (AWS, Azure, GCP)
  • IAM solutions (Okta, Azure AD, Ping)
  • Compliance frameworks (PCI-DSS, HIPAA, SOC2)
  • Programming languages and frameworks
  • Container/orchestration stack (Docker, K8s)
  • Third-party integrations explicitly named
What LinkedIn Profiles Reveal
  • Organizational hierarchy and reporting lines
  • Employee tenure β€” identifies recently hired staff (phishing targets)
  • Skills endorsements β€” confirms technology stack
  • Former employers β€” reveals talent sourcing and culture
  • Conference talks and publications
  • Open source contributions (links to GitHub)
  • Security certifications held by the team
  • Vendor relationship mentions in job histories

GitHub and Code Repository Intelligence

Public GitHub repositories are a frequently underestimated passive intelligence source. Organizations routinely expose infrastructure details, internal tooling, and occasionally credentials through public repositories β€” often unintentionally. Even repositories that contain no sensitive data reveal technology choices, coding conventions, and architectural decisions that inform attack surface analysis.

In 2019, researchers at GitGuardian reported that over 4 million secrets β€” including API keys, database passwords, and private keys β€” were exposed in public GitHub commits during that year alone. The vast majority of these exposures were inadvertent: developers committing configuration files, forgetting to add .gitignore entries for credential files, or pushing personal projects that contained work infrastructure details.

An LLM can assist with GitHub intelligence in several ways: analyzing repository README files and commit messages to infer infrastructure architecture, reviewing contributor lists to map personnel to technical roles, and identifying naming patterns in repository collections that suggest internal project structures.

Practical Technique: Commit History Analysis

Public repositories retain their full commit history even after sensitive files are deleted. The git log for a public repository may contain deleted credential files, internal IP addresses, and configuration details that were exposed for hours or days before removal. Tools like truffleHog and GitLeaks automate this analysis; an LLM can help interpret findings and prioritize high-value exposures.

Conference Papers, Patents, and Academic Publications

Large organizations frequently publish academic papers, conference talks, and patent applications describing their internal systems in detail. Google's published research on Spanner, Bigtable, and Borg gave competitors β€” and attackers β€” a detailed understanding of their internal infrastructure architecture years before those systems were externally visible.

For security teams, conference talks at DEF CON, Black Hat, and RSA where company engineers describe their defensive architecture in detail are a double-edged sword: they demonstrate capability and recruit talent, but they also create detailed public documentation of defensive systems that attackers can use to identify gaps. An analyst with an LLM can process a speaker's published slides and extract architecture details in minutes.

LLM Synthesis Prompt Pattern

Feed the LLM: "Here are 30 job postings from [Company X] collected over the past 12 months. Identify: (1) all named security tools and vendors, (2) all cloud platforms mentioned, (3) any compliance frameworks referenced, (4) changes in hiring focus that suggest new projects or strategic shifts, and (5) any gaps in the security team that suggest unmonitored attack surfaces." This single prompt produces a structured technology and personnel intelligence report from publicly available data.

Module 2 Β· Lesson 3 Quiz

Personnel & Technology Intelligence

4 questions Β· Select the best answer for each
1. In the Target Corporation breach analysis, what made the Fazio Mechanical Services vendor relationship a passive intelligence finding rather than requiring active reconnaissance?
Correct. Fazio was listed as a Target vendor on Fazio's own public website, and the facilities contractor relationship was documented in Target's own sustainability reports β€” both entirely passive, publicly available sources.
The key point from the lesson is that Fazio was listed as a Target vendor on Fazio's public website and in Target's sustainability reports. The vendor relationship was documented in public corporate filings and a contractor's own marketing materials β€” no active reconnaissance required.
2. A security analyst finds a job posting for "Senior Security Engineer" at a target company that lists: "experience with CrowdStrike Falcon, Splunk SIEM, Okta SSO, and AWS CloudTrail required." What intelligence does this single posting provide?
Correct. A single job posting can reveal the EDR platform, SIEM vendor, identity solution, cloud environment, and specific logging configurations β€” all passive intelligence with direct attack surface implications.
A job posting listing specific tool requirements reveals the actual security stack in significant detail. Each tool named maps to a specific defensive layer: endpoint detection, log aggregation, identity management, cloud provider, and audit logging.
3. GitGuardian's 2019 research found over 4 million secrets exposed in public GitHub commits. What common developer behavior causes most of these exposures?
Correct. The vast majority of GitHub credential exposures are inadvertent β€” developers forgetting to exclude configuration files, committing work infrastructure details in personal projects, or not realizing a repository was set to public.
GitGuardian's research found that most exposures were inadvertent β€” developers committing configuration files, missing .gitignore entries, or pushing personal projects containing work credentials. Intentional or malicious exposures represent a small fraction of the total.
4. An LLM is given 30 job postings from a target company spanning 12 months. Beyond listing the tools mentioned, what additional intelligence can a well-prompted LLM extract that a simple keyword search cannot?
Correct. An LLM excels at contextual pattern recognition across a corpus β€” identifying that a company suddenly started hiring heavily for cloud security engineers in Q2 (suggesting a cloud migration), or that zero WAF experience is mentioned (suggesting a gap), or that HIPAA is newly appearing (suggesting a new healthcare vertical).
The LLM's advantage over keyword search is contextual inference from patterns β€” identifying trends, gaps, and strategic shifts in the hiring data over time. These inferences require understanding context across multiple documents, which is exactly where LLMs outperform simple keyword matching.
Module 2 Β· Lab 3

Job Posting & LinkedIn Intelligence Extraction

Practice prompting an LLM to extract technology and personnel intelligence from open source data

Lab Objective

Use the AI assistant to practice extracting structured technology and personnel intelligence from job posting and LinkedIn data. The assistant will simulate responses based on realistic fictional data for a target organization.

Work through at least three analysis prompts: one for technology stack extraction, one for personnel mapping, and one for identifying security posture gaps based on what the hiring data implies about what the organization does and doesn't have covered.

Suggested opening: "I have job postings and LinkedIn data for a fictional target, Meridian Financial Group. I want to extract: (1) their complete security technology stack, (2) their organizational security hierarchy, and (3) any implied security gaps. Act as if you have access to 25 realistic job postings and LinkedIn profiles for their security team. Let's start with technology stack extraction."
Personnel & Tech Intel Analyst
OSINT Synthesis
Ready. I can help you practice extracting structured intelligence from job postings and LinkedIn data. I'll simulate realistic findings for Meridian Financial Group or any other target you specify. What aspect of their technology or personnel profile would you like to analyze first?
Module 2 Β· Lesson 4

Synthesizing Passive OSINT into Actionable Intelligence Reports

Turning fragmented data into structured threat profiles β€” the full LLM-assisted passive OSINT workflow from collection to report.
How do professional threat intelligence analysts structure LLM-assisted synthesis β€” and what does a complete passive OSINT report actually contain?

Since 2022, the open-source investigation collective Bellingcat has published detailed passive intelligence reports on Russian military activity using exclusively publicly available data β€” satellite imagery, social media geotagging, equipment photographed by soldiers, unit insignia visible in videos. Their reports have identified unit movements, equipment losses, and command structures that national intelligence agencies confirmed after the fact.

Bellingcat's methodology is the gold standard of passive OSINT synthesis: collect broadly, cross-reference rigorously, document sources completely, and acknowledge uncertainty explicitly. Their 2022 coverage of the Mariupol siege used no human sources and touched no Russian military systems. Every finding derived from data the subjects themselves had made public.

The Passive OSINT Workflow

Professional passive OSINT workflows follow a structured sequence. LLMs accelerate specific phases dramatically while leaving others to human judgment and legal review.

  1. Define Collection Scope: Identify the target entity (domain, organization, individual), the intelligence questions to be answered, and the legal authorization under which collection is occurring. An LLM can help draft scope statements and identify ambiguities before collection begins.
  2. Identify Source Categories: Map which of the six passive data categories (domain, personnel, technology, credential, geospatial, financial) are relevant to the intelligence questions. Not all categories are needed for every engagement.
  3. Execute Collection: Human analysts collect raw data from identified sources β€” CT logs, DNS records, job postings, LinkedIn, GitHub, public filings. This phase requires human judgment about source reliability and legal compliance. LLMs do not collect; they receive collected data.
  4. LLM-Assisted Synthesis: Feed collected data to an LLM with structured prompts requesting specific analytical outputs β€” technology mapping, personnel hierarchy reconstruction, timeline analysis, infrastructure gap identification. This is where LLMs provide maximum value.
  5. Human Review and Validation: Analyst reviews LLM synthesis for accuracy, flags unsubstantiated inferences, cross-references conflicting data points, and identifies collection gaps requiring additional passive research.
  6. Report Structuring: Produce a structured intelligence report with explicit source citations, confidence levels for each finding, and clear separation between confirmed facts and analytical inferences.

Structuring the Passive OSINT Report

The output of a passive OSINT engagement is a structured report. Professional threat intelligence reports β€” whether from commercial vendors like Recorded Future, Mandiant, or CrowdStrike, or from open-source investigators like Bellingcat β€” share a common structure that ensures findings are actionable and defensible.

Report SectionContentLLM Role
Executive SummaryKey findings, risk level, recommended actionsDraft from synthesized findings
Scope & MethodologyTarget definition, collection sources, dates, legal authorizationStructure and format
Infrastructure ProfileDomain map, IP ranges, hosting providers, CDN/WAF presenceSynthesize from CT/DNS/ASN data
Technology StackSecurity tools, cloud providers, SaaS integrations, frameworksExtract from job postings/LinkedIn/GitHub
Personnel MapKey individuals, roles, contact surfaces, tenure analysisReconstruct from LinkedIn/conference data
Credential ExposureKnown breach appearances, exposed credentials, paste site findingsSummarize aggregated breach data
Attack Surface AssessmentPrioritized list of exposure areas with supporting evidenceInfer from synthesized findings
Confidence LevelsHigh/Medium/Low for each key finding, with source basisAnnotate each finding

Confidence Levels and Analytical Uncertainty

Professional intelligence analysis requires explicit confidence labeling. The Intelligence Community uses a structured confidence framework β€” High, Moderate, Low β€” based on source quality, corroboration, and recency. Passive OSINT reports must apply the same discipline. An LLM synthesizing public data will sometimes draw inferences that are plausible but unconfirmed; these must be explicitly labeled as analytical assessments rather than established facts.

The Bellingcat methodology is instructive here: they annotate every evidentiary claim with the source, the date, and an explicit statement of what can and cannot be concluded from that source alone. When LLMs assist in synthesis, the analyst must review outputs for overconfident assertions β€” LLMs can present inferences as conclusions if not carefully prompted to distinguish between confirmed evidence and analytical judgment.

Prompt Pattern: Confidence Annotation

When requesting synthesis from an LLM, append: "For each finding, label your confidence as HIGH (multiple independent sources confirm), MEDIUM (single source, consistent with other evidence), or LOW (inferred from indirect signals only). Clearly distinguish confirmed facts from analytical inferences." This single prompt addition significantly improves the analytical quality of LLM-generated intelligence outputs.

Countering Passive OSINT: the Defender's Perspective

Understanding passive OSINT methodology also informs defensive practice. Organizations that conduct regular external attack surface assessments β€” essentially passive OSINT engagements against their own infrastructure β€” identify exposure before attackers do. This practice, formalized by vendors including Attack Surface Management (ASM) platforms like Tenable ASM, CyCognito, and Cortex Xpanse, is directly analogous to the offensive passive OSINT workflow.

Defensive countermeasures informed by this module include: enabling DMARC enforcement (p=reject), auditing job postings for technology over-disclosure, monitoring CT logs for unexpected certificate issuance (suggesting subdomain takeover or unauthorized certificate requests), and running periodic GitHub searches for organizational credentials or infrastructure details in public repositories.

Module 2 Synthesis

Passive OSINT with LLM assistance is not about collecting more data β€” it is about synthesizing existing public data faster and with greater depth than any human analyst can manage manually. The legal and ethical boundaries are clear: collect only from genuinely public sources, document your methodology, label your confidence levels, and operate within your authorized scope. The LLM is a synthesis engine. The analyst is the judgment layer. Neither is optional.

Module 2 Β· Lesson 4 Quiz

OSINT Synthesis & Reporting

4 questions Β· Select the best answer for each
1. Bellingcat's methodology for passive OSINT synthesis is described in this lesson as: "collect broadly, cross-reference rigorously, document sources completely, and…" β€” what is the fourth principle?
Correct. The four Bellingcat principles are: collect broadly, cross-reference rigorously, document sources completely, and acknowledge uncertainty explicitly. The last principle β€” explicit uncertainty acknowledgment β€” is what separates rigorous intelligence analysis from speculation.
The fourth Bellingcat principle stated in the lesson is "acknowledge uncertainty explicitly." This is a critical discipline in intelligence reporting β€” distinguishing confirmed facts from analytical inferences based on available evidence.
2. In the six-step passive OSINT workflow described in this lesson, at which step do LLMs provide maximum value?
Correct. The lesson explicitly states that Step 4 β€” LLM-Assisted Synthesis β€” is "where LLMs provide maximum value." Collection requires human judgment about source reliability and legal compliance; LLMs are synthesis engines, not collection tools.
The lesson explicitly identifies Step 4 (LLM-Assisted Synthesis) as where LLMs provide maximum value. Collection, scope definition, and validation require human judgment; synthesis of already-collected data is where LLM speed and pattern recognition excel.
3. When prompting an LLM to synthesize intelligence, what addition to a standard prompt significantly improves the analytical quality of the output according to this lesson?
Correct. The confidence annotation prompt pattern β€” labeling findings HIGH/MEDIUM/LOW based on source corroboration and explicitly separating confirmed facts from inferences β€” is the single most impactful addition to a synthesis prompt for analytical quality.
The lesson specifically highlights the confidence annotation prompt pattern as the key quality improvement: instructing the LLM to label each finding HIGH/MEDIUM/LOW based on source corroboration and to explicitly distinguish confirmed facts from analytical inferences.
4. Which defensive practice directly mirrors the offensive passive OSINT methodology taught in this module?
Correct. External Attack Surface Management (ASM) β€” as offered by platforms like Tenable ASM, CyCognito, and Cortex Xpanse β€” is the defensive application of the same passive OSINT workflow taught in this module. Organizations run it against themselves to find exposure before attackers do.
The lesson explicitly identifies Attack Surface Management (ASM) as the defensive mirror of offensive passive OSINT β€” organizations use ASM platforms to conduct the same passive reconnaissance against their own infrastructure that an attacker would use against a target.
Module 2 Β· Lab 4

Full Passive OSINT Report Generation

Practice building a complete passive intelligence report with LLM-assisted synthesis

Lab Objective

Bring together all passive data categories from this module β€” domain intelligence, personnel, technology, credentials, and financial β€” to build a complete structured passive OSINT report on a fictional target using LLM-assisted synthesis.

Walk through the six-step workflow from this lesson with the AI assistant. Practice applying confidence labels to findings and distinguishing confirmed evidence from analytical inference. Your report should be suitable for delivery to a red team client.

Suggested opening: "I want to build a complete passive OSINT report on Meridian Financial Group using the six-step workflow from the lesson. I have collected data across all six passive categories. Help me synthesize this into a structured report with executive summary, infrastructure profile, technology stack, personnel map, and attack surface assessment β€” with confidence labels on each key finding."
OSINT Report Generator
Full Synthesis Mode
Ready to help you build a complete passive OSINT report. We'll follow the six-step workflow: scope definition, source mapping, synthesis, human review checkpoints, confidence annotation, and final report structuring. Start by telling me what passive data you've collected β€” or describe the target and I'll simulate a realistic data set for practice purposes.
Module 2 Β· Assessment

Module Test: Passive OSINT with LLMs

15 questions Β· 80% required to pass Β· Select the best answer for each
1. What is the single defining criterion that classifies an OSINT collection method as "passive"?
Correct. Passive collection is defined by the absence of detectable interaction with the target's infrastructure.
The defining criterion is whether collection generates a detectable signal on target systems β€” not authentication requirements, anonymization tools, or intended use.
2. Which of the following is classified as an ACTIVE reconnaissance method (out of scope for passive OSINT)?
Correct. nmap port scanning sends packets directly to target systems, creating log entries β€” it is definitionally active reconnaissance.
Of these options, only nmap port scanning is active β€” it sends packets to target systems. CT log queries, LinkedIn review, and WHOIS lookups generate no signals on target infrastructure.
3. Certificate Transparency logs have been mandatory for publicly trusted CAs since which year?
Correct. The CA/Browser Forum required CT logging beginning in 2013, making all issued certificates publicly auditable from that point forward.
CT logging became mandatory under CA/Browser Forum requirements in 2013 β€” making every publicly trusted certificate issued since then auditable via logs like crt.sh.
4. In the SolarWinds SUNBURST case, the attacker-controlled C2 domain (avsvmcloud.com) was visible in which type of passive data source before the breach activated?
Correct. The C2 domain was certificated and appeared in publicly auditable CT logs before SUNBURST was activated β€” demonstrating that passive monitoring could have provided early warning.
The C2 domain's certificate issuance was visible in Certificate Transparency logs before the breach activated β€” a passive public signal that preceded the attack's operational phase.
5. A DNS TXT record SPF entry reads: "v=spf1 include:salesforce.com include:workday.com ~all". What does the Workday inclusion reveal about the target?
Correct. SPF includes reveal SaaS service integrations. Workday's presence confirms it is used for HR/finance operations and has been authorized to send mail β€” revealing a specific operational software dependency.
SPF include directives identify third-party services authorized to send email on the domain's behalf β€” this directly reveals SaaS operational dependencies. Workday (HR/finance platform) being listed confirms its deployment in the target organization.
6. A target organization's DMARC policy is "p=quarantine; pct=10". What does this configuration indicate?
Correct. "pct=10" means enforcement only applies to 10% of failing messages β€” the remaining 90% are delivered normally. This indicates an organization in the process of rolling out DMARC that has not yet reached full enforcement.
The "pct" (percentage) tag in DMARC specifies what percentage of failing messages the policy applies to. "pct=10" with "p=quarantine" means only 10% of unauthenticated messages are quarantined β€” 90% pass through normally, significantly reducing the protection offered.
7. ASN (Autonomous System Number) lookups are valuable in passive OSINT primarily because they:
Correct. ASN lookups via ARIN, RIPE, or BGP.he.net reveal publicly registered IP ranges β€” the complete scope of owned internet-facing infrastructure, available entirely passively.
ASN data from public registries (ARIN, RIPE, BGP.he.net) reveals the IP ranges registered to an organization β€” the full internet-facing infrastructure scope, available as entirely passive information requiring no interaction with target systems.
8. The Target Corporation breach (2013) is cited in this module as an example of how:
Correct. The Fazio vendor relationship was documented on Fazio's own public website and in Target's sustainability reports β€” entirely passive, requiring no interaction with Target's systems to discover.
The lesson cites the Target breach to illustrate that the entry vector β€” the Fazio HVAC contractor relationship β€” was documented on public websites and in Target's own published sustainability reports. No interaction with Target's infrastructure was needed.
9. GitGuardian's 2019 research found that most GitHub credential exposures resulted from:
Correct. The overwhelming majority of GitHub credential exposures are inadvertent developer errors β€” not malicious acts or deliberate publication.
GitGuardian found that inadvertent developer errors β€” forgotten .gitignore entries, personal projects with work credentials, accidental configuration file commits β€” accounted for the vast majority of the 4 million+ secrets exposed in 2019.
10. Which tool is specifically mentioned in this module for analyzing commit history to find deleted credentials in public Git repositories?
Correct. truffleHog (alongside GitLeaks) is cited in the lesson as a tool that automates the analysis of commit histories to identify deleted credentials and secrets in public repositories.
The lesson specifically cites truffleHog and GitLeaks as tools for analyzing commit histories β€” including deleted files β€” to find exposed secrets in public repositories.
11. In the six-step passive OSINT workflow described in Lesson 4, which phase explicitly requires human judgment about source reliability and legal compliance β€” NOT LLM execution?
Correct. Step 3 (Execute Collection) explicitly "requires human judgment about source reliability and legal compliance." The lesson states: "LLMs do not collect; they receive collected data."
Step 3 (Execute Collection) is explicitly identified in the lesson as requiring human judgment about source reliability and legal compliance β€” "LLMs do not collect; they receive collected data" is a key principle of the workflow.
12. The confidence annotation prompt pattern asks an LLM to label findings as HIGH, MEDIUM, or LOW. What defines a HIGH confidence finding according to the pattern described?
Correct. The confidence annotation framework in the lesson defines HIGH confidence as supported by multiple independent sources β€” corroboration across sources is the key criterion.
The lesson's confidence annotation framework defines HIGH confidence as "multiple independent sources confirm" β€” corroboration across independent sources, not source type, LLM scoring, or human validation status.
13. Which commercial platform category represents the direct defensive analog of offensive passive OSINT methodology, as described in Lesson 4?
Correct. Attack Surface Management (ASM) platforms β€” like Tenable ASM, CyCognito, and Cortex Xpanse β€” conduct passive OSINT-style reconnaissance against an organization's own infrastructure to identify exposure before attackers.
The lesson explicitly identifies External Attack Surface Management (ASM) as the defensive mirror of the offensive passive OSINT workflow β€” organizations use ASM platforms to apply the same methodology against themselves.
14. The 2013 Cambridge University Facebook "likes" study is cited in this module primarily to illustrate which concept?
Correct. The study illustrates the foundational passive OSINT principle: intelligence derives from synthesizing data the subject has already deposited publicly, requiring no interaction with the subject.
The Cambridge study is cited to illustrate that passive OSINT's foundational logic β€” the subject has already deposited the evidence publicly, and the analyst's job is synthesis, not interaction β€” predates LLMs and applies across domains.
15. An LLM generates a passive OSINT report section stating: "The target organization almost certainly uses CrowdStrike Falcon as its primary EDR platform." A rigorous analyst should flag this phrasing because:
Correct. "Almost certainly" is an unqualified confidence assertion. A rigorous intelligence report requires the analyst to specify: what sources support this, whether those sources are independent, and whether this is confirmed fact or analytical inference β€” the confidence annotation discipline the module emphasizes.
The issue is that "almost certainly" is an unqualified confidence assertion lacking the structured confidence labeling and source citation the module requires. The analyst must specify the source basis, document whether multiple independent sources confirm it, and distinguish confirmed fact from inference.