L1
Β·
Quiz
Β·
Lab
L2
Β·
Quiz
Β·
Lab
L3
Β·
Quiz
Β·
Lab
L4
Β·
Quiz
Β·
Lab
Module Test
Module 4 Β· Lesson 1

Email as Attack Surface

Corporate email formats, harvesting pipelines, and why inbox addresses are the keys to every kingdom.
Why does a single verified email address unlock far more than just a mailbox?

In June 2012, LinkedIn suffered a breach exposing 6.5 million SHA-1 hashed passwords. What was underreported: the dump also leaked email addresses tied to those accounts. Four years later, a full dataset of 117 million email–password pairs surfaced on a darknet marketplace for five bitcoin. Researchers found that a majority of those addresses followed the same corporate pattern β€” firstname.lastname@linkedin.com β€” which meant the format was now confirmed and enumerable for any employee, past or present.

The attackers who purchased the dataset did not target LinkedIn again. They used the confirmed email format to harvest addresses at every employer listed in LinkedIn profiles, then tested those addresses against banking portals, Slack workspaces, and SSO endpoints β€” a technique now called credential stuffing with format inference.

Why Email Is the Master Key

An email address is not a communication channel. It is an identity anchor. Every SaaS platform, every VPN, every HR portal, every cloud console requires one for authentication. When an attacker harvests a verified corporate email, they simultaneously obtain: a username candidate for every system that employee touches, a vector for spear-phishing and pretexting, a pivot point to social-engineer IT helpdesks, and β€” if the password is reused β€” immediate account access.

The 2020 SolarWinds supply-chain operation illustrates this. APT29 (Cozy Bear) conducted months of pre-intrusion OSINT. Email addresses harvested from public sources and previous breaches were used to identify high-value targets within SolarWinds customers before a single line of malicious code was deployed. Identity reconnaissance preceded technical exploitation by weeks.

91%
Breaches start with phishing
83%
Orgs use email as SSO seed
65%
Users reuse passwords cross-platform

Corporate Email Format Enumeration

Most organisations use one of five formats for employee email. Once a single verified address is found, the entire company's address space becomes predictable.

firstname.lastname
john.smith@corp.com β€” most common in enterprise, used by ~55% of Fortune 500 companies.
f.lastname
j.smith@corp.com β€” common in European firms; reduces length, increases collision risk.
firstnamelastname
johnsmith@corp.com β€” frequent in mid-market SaaS companies and startups.
lastname.firstname
smith.john@corp.com β€” used heavily in Japanese and Korean multinationals.
firstname+initial
johns@corp.com β€” short-form, common in law firms and financial services.
Role-based
security@, hr@, billing@ β€” functionally targeted; directly useful for pretexting.

The Hunter.io Methodology

Hunter.io (formerly Email Hunter) indexes email addresses found in public web content β€” press releases, academic papers, GitHub commits, forum posts, and WHOIS data. As of 2024, it holds over 200 million indexed addresses. Its Domain Search API returns all known addresses for a given domain along with a confidence score and source URL.

The tool's "Email Finder" function takes a name and domain and predicts the most likely address format based on the organisation's confirmed pattern. In penetration testing engagements, Hunter.io is typically the first enumeration step after identifying a target domain β€” it simultaneously reveals email format, active employees, and executive names, all drawn from public sources.

During the 2021 HubSpot breach post-mortem, security researchers demonstrated that all 30 million email addresses exposed were individually verifiable through Hunter.io's API within hours β€” illustrating how quickly harvested data integrates into existing OSINT toolchains.

OSINT Ethics Boundary

Email harvesting is legal in most jurisdictions when targeting your own organisation or with explicit written authorisation. Using harvesting tools against third parties without authorisation violates the Computer Fraud and Abuse Act (CFAA) in the US, the Computer Misuse Act in the UK, and analogous legislation globally. All labs in this module use synthetic domains and fictional personnel only.

Verification vs. Enumeration

Enumeration generates candidate addresses based on inferred format patterns. Verification confirms whether an address actually exists and is active. Verification methods include: SMTP RCPT TO probing (increasingly blocked), catch-all detection, Have I Been Pwned API lookups, and observing email open tracking pixels in red-team phishing campaigns.

AI models substantially accelerate the enumeration phase. Given a list of names from a LinkedIn scrape, an AI can generate all plausible addresses for five format patterns simultaneously, output them as a CSV, and flag which conform to the inferred dominant format β€” work that previously required custom scripting per engagement.

Key Takeaway

Email addresses are identity anchors, not just communication endpoints. A single confirmed corporate address reveals format, enables enumeration of the entire organisation, and serves as an authentication credential across dozens of systems. Understanding the harvesting pipeline β€” find one, infer format, enumerate all, verify β€” is foundational to both offensive OSINT and defensive posture assessment.

Lesson 1 Quiz

Email as Attack Surface β€” check your understanding
In the 2016 LinkedIn data exposure, what made the leaked dataset especially dangerous beyond the passwords themselves?
Correct. The confirmed email formats enabled attackers to enumerate addresses at every employer in the dataset β€” extending the breach well beyond LinkedIn itself.
Not quite. The critical additional risk was that confirmed email format patterns could be applied to every employer listed in the leaked profiles, multiplying the attack surface enormously.
Which of the following best describes "credential stuffing with format inference"?
Correct. The technique combines format inference (predicting valid email addresses) with credential stuffing (testing known passwords from other breaches against those addresses).
Incorrect. Format inference involves predicting valid email addresses based on observed patterns, then testing them with leaked passwords from unrelated breaches.
What does Hunter.io primarily index to build its email database?
Correct. Hunter.io crawls public-facing web content β€” not private or illicit sources β€” making it a legitimate OSINT tool used by sales teams, recruiters, and penetration testers alike.
Incorrect. Hunter.io indexes only publicly accessible content. Its legal standing depends precisely on this limitation.

Lab 1 β€” Email Format Inference Engine

Practice with the AI assistant Β· Minimum 3 exchanges to complete

Scenario

You are conducting an authorised red-team engagement against Syntherex Biomedical (a synthetic organisation). Your client has provided one confirmed email address: david.okonkwo@syntherex-bio.com

The AI assistant will help you infer the email format pattern, generate a candidate list from a supplied employee roster, and discuss verification strategies.

Suggested openers: "What format does Syntherex use?" / "Generate addresses for these employees: Ana Reyes, Tom Bright, Priya Naidu" / "How would I verify these without alerting the target?"
OSINT Lab Assistant
Email Harvesting
Ready. You have one confirmed address: david.okonkwo@syntherex-bio.com. What would you like to do with it β€” infer the format, generate a candidate list, or discuss verification approaches?
Module 4 Β· Lesson 2

Breach Data, HaveIBeenPwned, and the OSINT Stack

How historical breach databases become live reconnaissance assets β€” and how defenders monitor exposure in real time.
When a breach happened five years ago, why does it still matter to a red team today?

The May 2021 ransomware attack that shut down the Colonial Pipeline β€” disrupting fuel supply across the US East Coast for six days β€” began not with a zero-day exploit but with a single compromised VPN password. The password belonged to a legacy account; the account's email address had appeared in a previous breach dump from an unrelated service. The credentials were never rotated.

DarkSide, the ransomware group responsible, almost certainly used a breach aggregation service to identify the email–password pair, then tested it against Colonial's Citrix VPN endpoint β€” a routine step in what practitioners call credential spraying from breach data. The entire initial access phase likely took minutes.

The Breach Intelligence Ecosystem

Breach data circulates through a layered ecosystem. Fresh dumps appear first on closed Telegram channels or darknet forums, often sold as exclusive data. Within weeks they propagate to aggregate services β€” Have I Been Pwned (HIBP), Dehashed, IntelX β€” which index the data for lookup by email, phone, username, or password hash. Within months they appear in combo lists (email:password pairings compiled from multiple breaches), which are freely circulated on public paste sites and hacking forums.

HIBP, founded by security researcher Troy Hunt in 2013, now holds over 13 billion indexed records across 700+ breaches. Its API is used by Microsoft, Firefox, and 1Password to alert users of exposed credentials. For defenders, HIBP is a monitoring tool. For red teamers, understanding what HIBP does and does not index helps identify which breach sources to prioritise.

Real Breach Chronology β€” RockYou2021

In June 2021, a file named RockYou2021.txt was posted to a hacking forum. It contained 8.4 billion unique plaintext password entries compiled from previous breaches and combo lists. This was not a single new breach β€” it was an aggregation of decades of leaks. Its significance: passwords in that list represent actual human behaviour, making it the most comprehensive dictionary for offline hash-cracking ever publicly available.

How Defenders Use Breach Data

Enterprise security teams use breach intelligence in several ways. Domain monitoring β€” querying HIBP or similar services for all @company.com addresses in breach datasets β€” surfaces exposed employees. Credential notification programs alert employees whose email–password combinations appear in new leaks. Forced rotation policies trigger when an address appears in HIBP's Pwned Passwords database.

Microsoft's Entra ID (formerly Azure AD) natively integrates HIBP's Pwned Passwords hash list, blocking users from setting passwords that appear in breach data. This is now a baseline control in enterprise environments β€” though it only prevents password reuse, not account enumeration via the email address itself.

The OSINT Stack for Identity Reconnaissance

A mature identity-focused OSINT stack combines multiple data sources into a unified picture of a target identity. Practitioners layer tools sequentially:

Hunter.io
Confirms email format, indexes known addresses for a domain. Starting point for corporate enumeration.
HIBP API
Checks a specific address against 700+ known breaches. Returns breach names, dates, and data types exposed.
Dehashed
Paid service. Returns actual passwords (where plaintext is known), hashes, IPs, usernames, phone numbers associated with an address.
IntelX
Indexes paste sites, darknet leaks, and archived pages. Searches by email, domain, IP, or Bitcoin address.
Maltego
Graphical link-analysis platform that transforms an email into connected entities β€” social accounts, phone numbers, domains, co-workers.
AI Orchestration
LLMs are used to synthesise outputs from the above tools, generate phishing pretext drafts, identify patterns across large datasets, and prioritise high-value targets.
AI Acceleration in Breach Analysis

Security teams now feed raw HIBP CSV exports and Dehashed JSON dumps into AI models with prompts such as "identify which of these exposed accounts have the broadest system access based on job title and email domain." The AI cross-references names against LinkedIn data to surface C-suite and IT administrator accounts β€” a triage step that previously required hours of manual review, now completed in seconds.

Enumeration via Timing Attacks on Login Pages

Beyond breach data, email validity can be inferred from application behaviour. Some login systems return subtly different responses β€” in timing, error wording, or HTTP status code β€” when an email address exists versus when it does not. The 2019 Zoom enumeration vulnerability allowed unauthenticated users to confirm whether any email address had a Zoom account simply by observing the login error message. Similar flaws have been documented in Slack, GitHub Enterprise, and dozens of SaaS platforms.

AI-assisted OSINT pipelines now include automated enumeration of these timing differentials as a standard pre-phishing step, logging confirmed-valid addresses for targeting.

Lesson 2 Quiz

Breach Data and the OSINT Stack β€” check your understanding
What was the initial access vector in the 2021 Colonial Pipeline ransomware attack?
Correct. DarkSide used a legacy VPN account whose credentials had appeared in a prior breach dump β€” demonstrating how historical breaches become live attack vectors when passwords aren't rotated.
Not correct. The attack began with a single VPN password obtained from a prior, unrelated breach dump β€” one of the most common initial access patterns in ransomware operations.
How does RockYou2021 differ from a traditional single-source data breach?
Correct. RockYou2021 is a compiled combo list β€” 8.4 billion password entries aggregated from many prior leaks, making it primarily valuable for offline hash-cracking dictionary attacks.
Incorrect. RockYou2021 was not itself a breach of a new target β€” it was a massive compilation of passwords from many previous breaches, assembled into a single dictionary file.
What specific vulnerability class allows attackers to confirm email address validity without accessing breach data?
Correct. User enumeration vulnerabilities β€” where login forms reveal whether an account exists through different error messages or response times β€” have been documented in Zoom, Slack, and many other platforms.
Incorrect. The technique is called user enumeration: exploiting the fact that some login systems give different responses (error message wording, timing, HTTP status) depending on whether an account exists.

Lab 2 β€” Breach Intelligence Triage

Practice with the AI assistant Β· Minimum 3 exchanges to complete

Scenario

You are triaging breach data as part of a defensive engagement for Meridian Financial Group (synthetic). A HIBP domain search has returned 847 exposed addresses. You have a CSV excerpt of 12 high-priority accounts with their breach histories.

The AI assistant will help you prioritise remediation, understand breach severity, and draft a notification strategy.

Suggested openers: "How should I prioritise these 12 accounts?" / "What makes a breach high-severity for a financial firm?" / "Draft an internal notification for employees whose credentials appeared in the 2021 RockYou compilation."
OSINT Lab Assistant
Breach Intelligence
Ready to assist with breach triage for Meridian Financial Group. You have 847 exposed addresses and 12 flagged high-priority accounts. Would you like to start with prioritisation criteria, severity assessment, or drafting employee notifications?
Module 4 Β· Lesson 3

Social Media Identity Correlation

Cross-platform identity stitching, username permutation, and how a single online handle unravels a complete digital biography.
How does a username posted in 2009 on a gaming forum become a liability for a CISO in 2024?

The FBI's investigation into the Silk Road marketplace began not with a technical breach but with a username. A Bitcoin Talk forum post from 2011 used the handle altoid to advertise a "bitcoin startup" and included an email address: rossulbricht@gmail.com. The same handle had earlier asked a technical question using the same email. When investigators linked altoid to the Dread Pirate Roberts handle used on Silk Road, the pseudonymous drug marketplace operator was identified β€” not through network forensics, but through OSINT username correlation across two public forums separated by months.

Ulbricht was arrested in October 2013. The core investigative technique β€” tracing a consistent username across platforms and correlating it to a real identity β€” is now a standard OSINT methodology used by both law enforcement and corporate intelligence teams.

Username Permutation and Reuse

People are remarkably consistent with usernames. Research from Carnegie Mellon's CyLab (2017) found that 68% of users reuse the same username across five or more platforms. When a username is reused, it becomes a cross-platform identity thread β€” every account tied to it accumulates profile information, post history, location clues, and relationship data that can be stitched into a coherent biography.

Username permutation is the practice of generating variants of a known handle to find accounts the target may not have listed publicly. Common permutations include: appending birth years (jsmith1987), numbers (jsmith42), underscores, periods, platform-specific suffixes (_yt, _twitch), and misspellings. Tools like Sherlock, Maigret, and WhatsMyName automate checking hundreds of platforms simultaneously.

Sherlock
Open-source Python tool. Searches 300+ platforms for a given username. Returns live profile URLs.
Maigret
Sherlock successor. Checks 2,500+ sites including niche forums. Extracts profile metadata.
WhatsMyName
JSON-backed database of platform check patterns. Powers Maltego username transforms.
Namechk
Originally a brand availability tool, widely repurposed for OSINT username enumeration.

Cross-Platform Identity Stitching

Identity stitching combines data from multiple platforms to build a profile that no single platform would reveal. A typical sequence: a username found on GitHub reveals a real name and email in commit metadata. That email, queried on HIBP, reveals the target's former employer. The employer context, combined with a LinkedIn search, surfaces the target's career history. A Reddit account using the same username contains posts mentioning a specific city, gym, and commute route. The assembled profile β€” real name, current employer, email, city, daily routine β€” was constructed entirely from public data across five platforms.

The 2021 Bellingcat investigation into Alexei Navalny's poisoning used exactly this methodology against FSB officers: usernames, phone numbers, and email fragments found in leaked data were cross-referenced across social networks, flight databases, and hotel records to identify and name the agents responsible β€” a public demonstration of how identity stitching achieves outcomes previously requiring signals intelligence resources.

AI Role in Identity Stitching

AI models dramatically reduce the human analysis burden. A practitioner feeds raw profile data from Sherlock, Maltego exports, and HIBP results into an LLM with a prompt like: "Identify all identity overlaps across these profiles and summarise what a threat actor could learn about this person." The model synthesises connections, flags inconsistencies (different names on different platforms), and suggests additional search vectors β€” work that previously took an analyst several hours.

Metadata in Profile Images

Profile photographs are a frequently overlooked identity correlation vector. Before most platforms stripped EXIF data, profile photos uploaded from smartphones contained GPS coordinates, device make/model, and timestamps. Even after EXIF stripping, photographs can be subjected to: reverse image search (Google Lens, Yandex Images, TinEye) to find the same image on other platforms; facial recognition services (PimEyes, FaceCheck.ID); and AI-generated analysis of background elements, clothing brands, and environmental clues.

In the 2014 deanonymization of a Tor hidden service operator, a single forum avatar β€” a photograph with identifiable background elements β€” was reverse-searched and matched to a publicly posted photograph taken at a named conference. The operator was identified before any technical compromise of the hidden service occurred.

Key Takeaway

Digital identities are not isolated accounts β€” they are interconnected nodes in a graph that spans platforms, time periods, and personas. Username reuse, consistent writing style, recycled profile images, and cross-referenced metadata all contribute to identity graphs that AI tools can now synthesise in minutes. Both red teams and threat intelligence analysts must understand this pipeline to either execute or defend against identity stitching operations.

Lesson 3 Quiz

Social Media Identity Correlation β€” check your understanding
How did FBI investigators initially connect Ross Ulbricht to the Silk Road "Dread Pirate Roberts" persona?
Correct. The "altoid" handle appeared on Bitcoin Talk with Ulbricht's real Gmail address, and the same handle was linked to early Silk Road promotion β€” a textbook username correlation case.
Incorrect. The initial connection was made through OSINT β€” correlating the "altoid" username across public forums, one of which contained his real email address, not through technical exploitation of Tor.
What does the term "identity stitching" mean in an OSINT context?
Correct. Identity stitching is the process of combining data fragments across platforms β€” usernames, emails, photos, post history, metadata β€” to produce a unified profile richer than any single source would reveal.
Incorrect. Identity stitching refers to the analytical process of combining data from multiple sources β€” not creating fake accounts or tracking pixels. The goal is to construct a unified profile from fragmented public data.
Which tool checks usernames across approximately 2,500 platforms including niche forums and extracts profile metadata?
Correct. Maigret is the successor to Sherlock, expanded to cover approximately 2,500 sites. It goes beyond simple presence checking to extract available metadata from discovered profiles.
Incorrect. Maigret is the tool that covers approximately 2,500 platforms and extracts metadata. Sherlock covers around 300 platforms; Hunter.io focuses on emails; Dehashed is a breach data search service.

Lab 3 β€” Username Permutation & Identity Graph

Practice with the AI assistant Β· Minimum 3 exchanges to complete

Scenario

During an authorised threat intelligence assessment for Arcturus Ventures (synthetic), you have identified a username: ghostwren84 β€” used by a person whose real name is believed to be Marcus Holt. The account was found on a developer forum.

The AI assistant will help you generate username permutations, develop an identity stitching plan, and interpret hypothetical cross-platform findings.

Suggested openers: "Generate permutations of ghostwren84 for common platforms" / "If I find ghostwren84 on GitHub with commits showing m.holt@email.com, what's my next step?" / "How would I build an identity graph from these fragments?"
OSINT Lab Assistant
Identity Correlation
Ready. Starting point: username ghostwren84, suspected real name Marcus Holt, found on a developer forum. Shall I generate permutations, outline a cross-platform search plan, or help you interpret findings as we build the identity graph?
Module 4 Β· Lesson 4

AI-Generated Pretext and Spear-Phishing Construction

How harvested identity data transforms into targeted social engineering β€” and what defenders can detect.
What separates a spear-phishing email that gets clicked from one that gets reported?

In August 2015, networking hardware company Ubiquiti Networks disclosed a $46.7 million loss to a business email compromise (BEC) scheme. Attackers had spoofed the email addresses of senior executives and the company's Hong Kong law firm, then instructed finance department employees to transfer funds to accounts controlled by the attackers. The emails were not technically sophisticated β€” no malware, no exploits. They succeeded because they were contextually accurate: correct executive names and titles, reference to a real ongoing acquisition, and appropriately formal language that matched internal communication style.

The attacker's prior reconnaissance β€” almost certainly OSINT-derived β€” included the names of executives, the existence of a pending acquisition (mentioned in a press release), and the communication style drawn from public statements. The social engineering worked because the identity data was real.

The Anatomy of AI-Assisted Spear-Phishing

Modern spear-phishing construction follows a four-stage pipeline. Each stage is now substantially acceleratable by AI.

Stage 1: Targeting
OSINT harvest identifies high-value individuals: CFOs, IT administrators, M&A team members. Email addresses, LinkedIn summaries, recent activity, and reporting relationships are collected.
Stage 2: Contextualisation
The target's recent public activity β€” conference talks, tweets, press quotes β€” is gathered to provide credible context that the email appears to reference organically.
Stage 3: Drafting
An AI model generates the email body. The prompt includes the target's name, role, the apparent sender identity, a plausible pretext, and a desired action (click link, open attachment, wire funds).
Stage 4: Personalisation
The draft is refined with specific details β€” a real project name, a mutual contact's name, the target's correct title β€” that distinguish it from generic phishing. AI assists in matching writing style to impersonated sender.

Documented AI Use in Phishing Campaigns

In January 2024, the UK National Cyber Security Centre (NCSC) published a threat assessment warning that AI tools β€” including commercial LLMs β€” were already being used to improve the volume and credibility of phishing and spear-phishing campaigns. The assessment noted a specific increase in syntactically correct, contextually relevant messages that previously required native speakers or skilled social engineers to produce.

IBM's X-Force threat intelligence team reported in 2023 that AI-generated phishing emails achieved an 11% higher click rate than human-written equivalents in red-team testing β€” while taking a fraction of the time to produce. The combination of volume scalability and quality improvement represents a qualitative shift in the threat landscape.

Microsoft's Digital Crimes Unit has observed APT groups β€” specifically Midnight Blizzard (APT29) and Charcoal Typhoon (APT40) β€” using LLMs to translate phishing lures, research targets, and draft pretexting content, documented in Microsoft's February 2024 threat intelligence report co-authored with OpenAI.

Writing Style Mimicry

One of the more subtle AI-assisted phishing techniques involves feeding an LLM examples of a sender's genuine writing β€” emails, LinkedIn posts, public statements β€” and asking it to draft the phishing message in that style. If the apparent sender is a CEO whose LinkedIn posts use specific vocabulary and sentence patterns, the AI-generated email matches that style sufficiently to pass casual scrutiny. This technique was theoretically described in 2022 and operationally observed in threat intelligence reports by 2023.

Detection and Defence

Defenders counter AI-assisted spear-phishing through a combination of technical controls and user education. DMARC, DKIM, and SPF reduce email spoofing success; email gateway sandboxing catches malicious attachments and links. But the most effective defence against contextually accurate pretexting is understanding what identity data is publicly accessible and proactively reducing the target's OSINT footprint.

Organisations now conduct OSINT audits of key personnel β€” C-suite, finance team, IT administrators β€” to identify and remove unnecessary public information before it can be harvested. LinkedIn profiles are reviewed for operational security: removing specific project names, direct reports, and internal system references that provide pretext material. This discipline is called personnel OPSEC and is increasingly standard in high-risk organisations.

Key Takeaway

The quality ceiling on spear-phishing has been effectively removed by AI. What required a skilled social engineer hours of research and drafting now takes minutes. The best defence is not solely technical β€” it is reducing the quality of available pretext material through proactive OSINT footprint management, combined with robust authentication controls (MFA, hardware keys) that make credential theft from successful phishing less consequential.

Lesson 4 Quiz

AI-Generated Pretext and Spear-Phishing β€” check your understanding
What made the 2015 Ubiquiti Networks BEC attack successful despite using no malware?
Correct. The attack succeeded through contextual accuracy β€” real names, a real acquisition reference, and appropriate tone β€” demonstrating that social engineering grounded in accurate OSINT can be more effective than technical exploits.
Incorrect. The attack succeeded through contextual accuracy derived from OSINT β€” correct executive names and titles, a real ongoing acquisition, and appropriate formal tone. No technical exploitation was required.
According to IBM X-Force 2023 testing, how did AI-generated phishing emails compare to human-written equivalents?
Correct. IBM X-Force found AI-generated phishing achieved an 11% higher click rate in red-team testing β€” while being produced faster. This quality-plus-scale advantage is why AI-assisted phishing represents a qualitative threat escalation.
Incorrect. IBM X-Force's 2023 red-team testing found AI-generated phishing emails achieved 11% higher click rates than human-written ones β€” while being produced in a fraction of the time. The threat is both higher quality and higher volume.
What is "personnel OPSEC" in the context of defending against spear-phishing?
Correct. Personnel OPSEC involves auditing and reducing what public information is available about key individuals β€” removing project names, internal references, and specific role details from LinkedIn and other public profiles before attackers can harvest them as pretext material.
Incorrect. Personnel OPSEC specifically refers to proactively managing the public information footprint of high-risk employees β€” reviewing LinkedIn profiles, press mentions, and conference appearances to remove details that provide spear-phishing pretext material before attackers can harvest it.

Lab 4 β€” Spear-Phishing Pretext Analysis

Practice with the AI assistant Β· Minimum 3 exchanges to complete

Scenario

You are a red team operator for Halcyon Defense Consulting (synthetic) conducting an authorised social engineering assessment. Your target is Sandra Osei, CFO of Orion Logistics Group (synthetic). From OSINT you have: her email (s.osei@orion-log.com), LinkedIn showing she's overseeing a "Q1 ERP migration," a recent quote in a trade publication, and her assistant's name (James Park).

The AI assistant will help you analyse pretext quality, identify OSINT gaps, and discuss what makes this scenario detectable from a defensive standpoint.

Suggested openers: "What pretext would work best given this OSINT profile?" / "How would defenders detect this approach?" / "What additional OSINT would improve the pretext quality?"
OSINT Lab Assistant
Spear-Phishing Analysis
Ready. Target profile loaded: Sandra Osei, CFO, Orion Logistics Group. You have her email format, ERP migration context, a trade publication quote, and her assistant's name. What aspect of the pretext analysis would you like to explore β€” attack construction, detection vectors, or OSINT gaps?

Module 4 β€” Final Test

Email and Identity Harvesting Β· 15 questions Β· 80% to pass
1. What is an "identity anchor" in the context of email reconnaissance?
Correct. An identity anchor is an email address that simultaneously serves as a username, authentication credential, and pivot point across multiple systems.
Incorrect. An identity anchor refers to an email address functioning as a cross-system authentication credential β€” used as a username and pivot across many platforms simultaneously.
2. The most common corporate email format in Fortune 500 companies is:
Correct. The firstname.lastname format is used by approximately 55% of Fortune 500 companies, making it the dominant pattern for corporate email enumeration.
Incorrect. The firstname.lastname format (e.g., john.smith@company.com) is the most common in Fortune 500 enterprises, used by approximately 55% of them.
3. Hunter.io's Email Finder function operates by:
Correct. Hunter.io's Email Finder uses the organisation's confirmed email format (derived from indexed addresses) to predict the most likely address for a given name and domain.
Incorrect. Hunter.io uses the confirmed format pattern from its indexed database of public web data to predict addresses β€” it does not probe SMTP servers or access Active Directory.
4. In the Colonial Pipeline attack, what was the root cause that allowed initial access?
Correct. DarkSide used a compromised VPN credential from a prior unrelated breach β€” the legacy account had never had its password rotated, allowing direct access to Colonial's network.
Incorrect. A single unrotated VPN password from a prior breach dump provided initial access β€” a demonstration of how credential hygiene failures and breach data persistence combine to create live attack vectors.
5. Which service holds over 13 billion indexed records and is integrated into Microsoft Entra ID's password policies?
Correct. HIBP, founded by Troy Hunt in 2013, holds over 13 billion records and its Pwned Passwords hash list is integrated directly into Microsoft Entra ID to block use of breached passwords.
Incorrect. Have I Been Pwned (HIBP) is the service with 13+ billion indexed records that Microsoft integrates into Entra ID password policies to block compromised passwords.
6. RockYou2021's primary value to attackers is as:
Correct. RockYou2021's 8.4 billion entries represent aggregated human password behaviour β€” making it the most comprehensive dictionary for offline cracking of password hashes from newly breached systems.
Incorrect. RockYou2021 is a password dictionary β€” an aggregation of 8.4 billion real passwords from decades of prior breaches, primarily useful for offline hash-cracking attacks against newly obtained password databases.
7. User enumeration vulnerabilities in login systems allow attackers to:
Correct. User enumeration exploits systems that give different responses (error message text, response timing, HTTP status) depending on whether an account exists β€” allowing confirmation of valid email addresses without authentication.
Incorrect. User enumeration exploits differences in system responses (different error messages, response times, or status codes) to confirm whether a specific email address has a registered account β€” without requiring a valid password.
8. The Carnegie Mellon CyLab research (2017) found that what percentage of users reuse the same username across five or more platforms?
Correct. CMU CyLab found 68% username reuse across five or more platforms β€” which is why a single found username becomes a reliable cross-platform identity thread for OSINT investigators.
Incorrect. CMU CyLab's 2017 research found 68% of users reuse the same username across five or more platforms, making username reuse one of the most powerful identity correlation vectors available to investigators.
9. The Silk Road investigation's initial identification of Ross Ulbricht is a case study in:
Correct. The "altoid" username appeared on Bitcoin Talk alongside Ulbricht's real Gmail β€” a textbook cross-platform username correlation that identified him before any technical compromise of Silk Road.
Incorrect. The initial identification was through username correlation OSINT: "altoid" on Bitcoin Talk linked to his real email address, connecting the public persona to the Dread Pirate Roberts handle used on Silk Road.
10. Which tool covers approximately 2,500 platforms and extracts profile metadata, making it the most comprehensive username search tool currently available?
Correct. Maigret β€” the successor to Sherlock β€” covers approximately 2,500 sites including niche forums and extracts available profile metadata beyond simple presence detection.
Incorrect. Maigret is the tool covering ~2,500 platforms with metadata extraction. Sherlock covers ~300 platforms; WhatsMyName uses a JSON database of patterns; Namechk was originally a brand tool.
11. What was the 2021 Bellingcat investigation into Navalny's poisoning primarily based on?
Correct. Bellingcat used identity stitching across public databases to identify the FSB agents by name β€” a landmark demonstration that open-source methods can achieve intelligence outcomes previously requiring signals intelligence resources.
Incorrect. Bellingcat used OSINT identity stitching β€” combining usernames, phone numbers, email fragments, flight records, and hotel databases β€” to identify the FSB agents responsible, without classified sources.
12. According to IBM X-Force 2023 testing, AI-generated phishing emails achieved what result compared to human-written ones?
Correct. IBM X-Force found AI-generated phishing emails outperformed human-written ones by 11% in click rate β€” while being created significantly faster. This quality-plus-speed advantage represents a fundamental shift in the phishing threat landscape.
Incorrect. IBM X-Force's 2023 testing found AI-generated phishing emails achieved 11% higher click rates while being produced much faster than human-written equivalents β€” a significant quality and efficiency advantage for attackers.
13. Which APT groups did Microsoft's February 2024 report (co-authored with OpenAI) document as using LLMs for phishing and reconnaissance?
Correct. Microsoft and OpenAI's February 2024 report documented Midnight Blizzard (Russian, APT29) and Charcoal Typhoon (Chinese, APT40) using LLMs to translate lures, research targets, and draft pretexting content.
Incorrect. Microsoft's February 2024 report named Midnight Blizzard (APT29) and Charcoal Typhoon (APT40) as documented users of LLMs for translation, target research, and phishing pretext drafting.
14. "Personnel OPSEC" as a defence against AI-assisted spear-phishing primarily involves:
Correct. Personnel OPSEC focuses on proactively reducing the public information footprint of key individuals β€” removing project names, internal references, and specific role details that provide high-quality spear-phishing pretext material.
Incorrect. Personnel OPSEC is specifically about managing the public information available about key individuals β€” reviewing and removing details from LinkedIn profiles, press mentions, and conference bios that would provide attackers with quality pretext material.
15. The four stages of AI-assisted spear-phishing construction in order are:
Correct. The pipeline proceeds: Targeting (identify high-value individuals) β†’ Contextualisation (gather recent activity for plausible context) β†’ Drafting (AI generates the email) β†’ Personalisation (add specific details to distinguish from generic phishing).
Incorrect. The correct order is Targeting β†’ Contextualisation β†’ Drafting β†’ Personalisation. Targeting identifies the victim; contextualisation gathers pretext material; drafting generates the email; personalisation adds specific details for credibility.