Module 2 · Lesson 1

Banner Grabbing & AI-Powered Version Detection

From raw socket responses to structured vulnerability intelligence

How does AI transform raw banner text into actionable service fingerprints?

On February 5, 2021, an attacker accessed the Oldsmar, Florida water treatment facility's control systems through TeamViewer, briefly raising sodium hydroxide levels to 111 times the safe limit. Post-incident analysis by Dragos and Mandiant revealed the facility was running an unpatched Windows 7 system with a publicly exposed remote desktop service — a fact discoverable in seconds via Shodan's banner-indexed database. The version string "Windows 6.1" in RDP banners had been crawled and catalogued for months before the incident.

What Is Banner Grabbing?

When a network service accepts a connection, it typically responds with an identifying string — a banner — before any credentials are exchanged. This string often contains the software name, version number, operating system, and sometimes the hostname. Banner grabbing is the deliberate act of collecting these strings to fingerprint services.

Classic tools like Netcat, Telnet, and later Nmap have always supported banner capture. What changed with AI is what happens after capture: instead of manually cross-referencing CVE databases, modern AI pipelines parse banners, identify the exact software version, retrieve known vulnerabilities, and rank exploitation likelihood — in seconds.

Traditional Banner Grab

Connect via Netcat or Nmap
Receive raw text response
Manually search NVD/CVE
Cross-reference exploit-db
Estimate patch status manually

AI-Augmented Pipeline

Connect & capture banner (automated)
LLM parses version + product
Automated CVE lookup & scoring
Exploit path probability ranking
Report generation in natural language

Reading a Banner: Anatomy

A raw banner from an SSH service might look like this:

# Raw SSH banner captured via: nc 192.168.1.10 22
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u7
Protocol mismatch.

From this single line, a trained analyst (or AI) can extract: protocol version (SSH 2.0), software (OpenSSH), exact version (7.4p1), OS distribution (Debian 9 Stretch), and the patch level (deb9u7). OpenSSH 7.4 is affected by CVE-2018-15473 (user enumeration), CVE-2016-10009, and several others — all indexed and exploitable.

Key Technique — Nmap Service Version Detection

Nmap's -sV flag probes open ports with a library of over 6,500 version probes (nmap-service-probes). The --version-intensity flag (0–9) controls aggressiveness. AI tools like OpenAI's function-calling API can wrap Nmap XML output, parse every <service> element, and batch-query vulnerability databases automatically.

# Nmap version scan with XML output for AI parsing
nmap -sV --version-intensity 7 -oX services.xml 10.10.10.0/24

# Parse with Python + send to AI
import xml.etree.ElementTree as ET
tree = ET.parse('services.xml')
for port in tree.iter('port'):
    svc = port.find('service')
    if svc is not None:
        print(svc.get('product'), svc.get('version'))

AI Interpretation Layer

Modern AI-assisted pentesting platforms — including Synack's AI triage layer and Bishop Fox's CAST platform — pass service banners through language model pipelines that perform contextual risk assessment. The model is prompted with the raw service data plus a CVE corpus, then asked to reason about exploitation feasibility given the network context.

This is not just search — it's reasoning. An AI can recognize that a service running an old OpenSSL version behind a load balancer has a different attack surface than the same version directly exposed, and adjust its risk assessment accordingly.

Real Deployment — Shodan AI Annotations (2023)

Shodan introduced ML-based service classification in 2022–2023, automatically categorizing banner responses into service types even when operators deliberately obscure version strings. Their model was trained on millions of historical banners and achieves over 94% accuracy on novel service identification — turning passive scanning into an AI-enhanced intelligence layer.

CPE String Common Platform Enumeration — a standardized naming scheme for software. AI parsers convert banner text to CPE format (e.g., cpe:/a:openbsd:openssh:7.4) to query the NVD programmatically.

Version Probe A crafted packet sent by tools like Nmap to elicit a specific service response, distinct from banner grabbing in that it actively solicits version disclosure rather than passively receiving it.

Null Banner A service that returns no identifying information. AI can still fingerprint these via timing analysis, response structure, and behavioral signatures — techniques pioneered in network intrusion detection research.

Lesson 1 Quiz

Banner Grabbing & AI-Powered Version Detection · 4 questions

What does an SSH banner like "SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u7" directly reveal to a pentester?

Correct. The banner encodes SSH protocol (2.0), software (OpenSSH 7.4p1), and the Debian 9 package revision — enough to query specific CVEs like CVE-2018-15473.

Review the banner anatomy section. A single SSH banner reveals protocol, version, OS family, and patch revision — far more than just the service name.

The Nmap flag -sV does what, specifically?

Correct. -sV triggers Nmap's version detection engine, sending probes from nmap-service-probes to elicit version strings from each open port.

-sV is service version detection. -sS is SYN stealth, -sC runs default scripts, -oX saves XML. Review the Nmap section.

What does "CPE String" stand for and why is it useful in AI-assisted banner analysis?

Correct. CPE strings convert free-text banner data into a machine-queryable format (e.g., cpe:/a:openbsd:openssh:7.4) that programmatically links to CVE records.

CPE = Common Platform Enumeration. It standardizes software naming so AI and tools can reliably query vulnerability databases. Review the key terms section.

In the 2021 Oldsmar water treatment attack, what made the Windows 7 RDP exposure discoverable before the incident?

Correct. The RDP banner's "Windows 6.1" version string (Windows 7) was crawled and indexed by Shodan, making the vulnerable system publicly discoverable via a simple search query.

Review the opening story. The Windows 7 RDP banner had been indexed by Shodan long before the attack — a core example of why banner information is so operationally significant.

Lab 1 — AI Banner Analysis Assistant

Practice parsing raw service banners and querying vulnerability implications

Lab Objective

You have just run an Nmap -sV scan and captured the following raw banners from a target subnet. Use the AI assistant to practice interpreting them: identify the software, version, known CVEs, and recommended follow-up probe techniques.

Sample banners to analyze:
1. Apache httpd 2.4.29 (Ubuntu) — port 80
2. vsftpd 2.3.4 — port 21
3. OpenSSH 7.4p1 Debian-10+deb9u7 — port 22

Try asking: "What CVEs affect vsftpd 2.3.4?" or "How do I extract more version data from Apache 2.4.29?"

AI Lab Assistant

Banner Analysis

Module 2 · L1

Welcome to Lab 1. I'm your banner analysis assistant for this session. Paste any raw service banner or Nmap -sV output and I'll help you identify the software version, map it to known CVEs, and recommend follow-up enumeration steps. What banner would you like to start with?

Module 2 · Lesson 2

Nmap Script Engine & AI-Driven NSE Orchestration

Automating targeted service interrogation with AI-selected script chains

How can AI intelligently select and chain NSE scripts to maximize service intelligence with minimal noise?

The 2017 Equifax breach, which exposed records of 147 million people, exploited Apache Struts CVE-2017-5638. Security researchers performing post-breach reconstruction demonstrated that a targeted Nmap NSE scan using the http-shellshock and http-vuln-cve2017-5638 scripts would have identified the vulnerable endpoint within minutes — a scan that took under 30 seconds per host. The vulnerability had been public for two months before exploitation began.

Nmap Scripting Engine: Architecture

The Nmap Scripting Engine (NSE) extends Nmap's capabilities through Lua scripts that interact with discovered services. Scripts are organized into categories: auth, broadcast, brute, default, discovery, dos, exploit, external, fuzzer, intrusive, malware, safe, version, vuln. The -sC flag runs all scripts in the default category; --script=vuln runs vulnerability-specific scripts.

As of 2024, there are over 600 officially maintained NSE scripts. Manual selection among these for a given target requires deep expertise. This is precisely where AI adds leverage: given a list of open ports and detected services, an LLM can recommend the optimal script subset, ordered by yield and noise level.

Safe Category Discovery Category Intrusive Category Exploit Category Version Category

# Run default scripts + version detection
nmap -sC -sV -p 22,80,443,3306 10.10.10.50

# Target specific vulnerability scripts
nmap --script=http-vuln-cve2017-5638 -p 80 10.10.10.50

# Run all vuln scripts against discovered services
nmap --script=vuln --script-timeout 30s 10.10.10.50

# SMB-specific enumeration chain
nmap --script="smb-vuln*,smb-enum*" -p 445 10.10.10.50

AI-Driven Script Selection

The practical problem with NSE is that running all available scripts against a target is slow, noisy, and often triggers IDS. AI-assisted platforms solve this by reasoning about which scripts are appropriate. Given the output of an initial port scan, a language model can be prompted with the discovered services and asked: "Which NSE scripts should I run, and in what order, to maximize service intelligence while minimizing IDS triggering?"

Tools like PentestGPT (released as open-source in 2023 by researchers at Nanyang Technological University) demonstrated this approach — the model correctly selected targeted NSE script chains in controlled evaluations, outperforming novice testers in script selection accuracy.

PentestGPT Research — NTU, 2023

The PentestGPT paper (Deng et al., 2023, arXiv:2308.06782) showed GPT-4 could guide penetration testing tasks including service enumeration. In controlled HackTheBox evaluations, the AI-assisted approach completed 228% more subtasks than unaided novice testers, with NSE script selection being one of the highest-leverage augmentation points.

Practical AI Prompting for NSE

The key to effective AI-assisted NSE orchestration is providing the model with structured scan output. A well-formatted prompt includes: open ports, detected service names and versions, OS guess, and any HTTP headers or SSL certificate data already collected. The model reasons over this context and returns a prioritized script list with justifications.

# Example AI prompt template for NSE selection
"""
Target discovery scan results:
- Port 21/tcp: vsftpd 2.3.4
- Port 22/tcp: OpenSSH 7.4p1 Debian-10
- Port 80/tcp: Apache httpd 2.4.29
- Port 3306/tcp: MySQL 5.7.35

Engagement scope: internal network pentest, detection avoidance LOW priority.
Recommend NSE scripts for each service with priority ordering and rationale.
"""

AI output from this prompt will typically recommend: ftp-vsftpd-backdoor for port 21 (vsftpd 2.3.4 has a famous backdoor — CVE-2011-2523), http-shellshock and http-vuln-cve2017-5638 for port 80, and mysql-empty-password plus mysql-enum for port 3306. This maps to a focused, high-yield scan chain assembled in seconds.

vsftpd 2.3.4 — The Backdoor Service

vsftpd 2.3.4, released in 2011, contained a deliberate backdoor (CVE-2011-2523): a smiley face ":)" in the username triggered a bind shell on port 6200. This is featured in Metasploitable 2 and remains one of the most commonly tested vulnerabilities in OSCP environments. An AI parsing the vsftpd 2.3.4 banner should immediately flag this specific CVE as a critical finding.

NSE Nmap Scripting Engine — Lua-based extension framework enabling complex service interrogation, vulnerability checks, and exploitation assistance directly within Nmap.

Script Categories NSE organizes scripts by behavior risk: safe (no server-side impact), intrusive (may crash services), exploit (active exploitation), and vuln (vulnerability detection without exploitation).

Script Chaining Running multiple NSE scripts sequentially where output from one informs targeting of the next — a workflow AI can orchestrate by reasoning about service dependencies and vulnerability correlations.

Lesson 2 Quiz

NSE & AI-Driven Script Orchestration · 4 questions

What scripting language is used to write Nmap NSE scripts?

Correct. NSE scripts are written in Lua, a lightweight scripting language designed for embedding — making it suitable for Nmap's internal execution model.

NSE uses Lua, not Python or Ruby. Lua was chosen for its small footprint and ease of embedding within C applications like Nmap.

CVE-2011-2523 (vsftpd 2.3.4) is notable because it is triggered by what unusual mechanism?

Correct. The backdoor was deliberately inserted — a smiley ":)" in the username opens a command shell on TCP 6200. It's one of the most famous intentional backdoors in open-source history.

The vsftpd 2.3.4 backdoor is triggered by a smiley face in the username. Review the gold callout in Lesson 2.

The 2017 Equifax breach exploited Apache Struts CVE-2017-5638. What did post-breach reconstruction demonstrate about NSE detection?

Correct. Reconstruction showed that a simple NSE scan would have flagged the vulnerable endpoint quickly — the CVE had been public for two months before exploitation.

Review the Lesson 2 story section. NSE had a script for CVE-2017-5638, and reconstruction showed it could identify the vulnerability rapidly.

What key advantage did PentestGPT (NTU, 2023) demonstrate for AI-assisted NSE script selection?

Correct. The PentestGPT paper (arXiv:2308.06782) found AI assistance dramatically improved task completion rates in HackTheBox evaluations, with NSE selection being one of the highest-gain areas.

Review the PentestGPT callout. The key finding was 228% more subtasks completed versus unaided novices — AI augmentation, not replacement.

Lab 2 — NSE Script Selector

Practice prompting AI to build targeted NSE script chains from scan output

Lab Objective

You have completed an initial Nmap port scan. Use the AI assistant to build an optimized NSE script chain for each discovered service. Focus on script category selection, ordering by yield vs. noise tradeoff, and understanding why each script is recommended.

Scan results to work with:
Port 21 — vsftpd 2.3.4 | Port 23 — telnetd | Port 80 — Apache 2.2.8 | Port 445 — Samba 3.0.20

Try: "What NSE scripts should I run against Samba 3.0.20 on port 445?" or "Give me a safe-only script chain for Apache 2.2.8"

AI Lab Assistant

NSE Orchestration

Module 2 · L2

Welcome to Lab 2. I'll help you build targeted NSE script chains for any discovered service. Provide service names and versions, tell me your noise tolerance (stealth vs. aggressive), and I'll recommend scripts with priority ordering and rationale. What services should we start analyzing?

Module 2 · Lesson 3

Protocol Fingerprinting & OS Detection with AI

Beyond port numbers — identifying services by behavior, timing, and protocol deviation

When services run on non-standard ports or hide their identity, how does AI reconstruct what's actually running?

The SUNBURST backdoor discovered in December 2020 communicated via obfuscated DNS — encoding C2 traffic inside legitimate-looking DNS queries to avoid detection. The traffic appeared to be ordinary DNS lookups to avsvmcloud.com. FireEye's detection breakthrough came partly from AI-assisted protocol analysis: ML models trained on DNS query timing, subdomain entropy, and response TTL patterns flagged the traffic as anomalous despite its legitimate protocol wrapper.

Why Port Numbers Lie

Services do not have to run on their registered ports. SSH can run on port 443. HTTP can run on port 8888. Malware regularly tunnels over port 80 and 443 precisely because these are allowed through most firewalls. Traditional service identification by port number alone — "port 80 means HTTP" — fails in adversarial environments.

Protocol fingerprinting identifies services by their actual communication behavior rather than port assignment. Nmap's OS detection engine (enabled with -O) sends a series of crafted probes and analyzes response deviations against a database of over 5,500 OS fingerprints. AI extends this by reasoning about ambiguous matches and protocol behavioral patterns that don't fit neat database entries.

TCP/IP Stack Fingerprinting Signals

Initial TTL values (Linux: 64, Windows: 128)
TCP window size in SYN packets
IP DF (Don't Fragment) bit behavior
TCP options ordering (MSS, SACK, timestamps)
RST packet handling differences
ICMP error message throttling

Application-Layer Fingerprinting Signals

HTTP header ordering & capitalization
TLS/SSL cipher suite ordering
SSH key exchange algorithm preference
SMB dialect negotiation sequence
Response timing under connection load
Error message format and wording

# OS detection + version detection combined
nmap -O -sV --osscan-guess 10.10.10.50

# Aggressive detection (OS + version + scripts + traceroute)
nmap -A 10.10.10.50

# Check what's actually running on an unusual port
nmap -sV -p 4444 --version-intensity 9 10.10.10.50

# Passive OS fingerprinting with p0f (no packets sent)
p0f -i eth0 -o fingerprint_log.txt

AI-Assisted Protocol Analysis

Tools like p0f perform passive OS fingerprinting — identifying operating systems from intercepted traffic without sending any probes. AI extends passive fingerprinting by combining multiple weak signals into confident identifications. A 2022 paper by researchers at Georgia Tech ("NetStar: Neural Network-Based Encrypted Traffic Classification") demonstrated that transformer models trained on packet timing, size distributions, and TLS metadata could classify encrypted service traffic with 96.3% accuracy — identifying services that deliberately hide their banners.

For pentesters, this means: even if an operator strips all identifying banners and runs services on random ports, AI-assisted passive fingerprinting can still identify what's running by observing protocol behavior over time.

TLS Fingerprinting — JA3 & JA4

JA3 (Salesforce, 2017) creates an MD5 hash of TLS ClientHello parameters — SSL version, ciphers, extensions, elliptic curves. Different applications produce different JA3 hashes even when traffic is encrypted. JA4 (2023) extends this with more granular fields. AI classifiers trained on JA3/JA4 databases can identify client applications (browsers, malware families, scanners) purely from TLS handshake behavior — no decryption required.

AI Disambiguation of OS Guesses

Nmap's OS detection sometimes returns ambiguous results — a host might match multiple OS fingerprints with similar confidence scores. AI can break these ties by reasoning about contextual clues: open service combinations (IIS + RDP strongly suggests Windows), TTL values, SMB dialect versions, and HTTP Server headers collectively narrow the OS identification far more reliably than any single signal.

Step 1 — Port sweep: Identify which ports are open across the target range.

Step 2 — Service version probe: -sV to capture banner strings and version data.

Step 3 — OS fingerprinting: -O combined with passive p0f for cross-validation.

Step 4 — AI disambiguation: Feed all signals to LLM for contextual OS/service confirmation.

Step 5 — CVE correlation: AI maps confirmed services to vulnerability database entries with CVSS scoring.

Masscan + AI — Speed at Scale

Masscan can scan the entire IPv4 address space in under 6 minutes at 10M packets/second. However, its output is raw — port open/closed with no service detail. AI pipelines that accept Masscan's initial sweep and automatically trigger targeted Nmap -sV scans on discovered hosts represent a two-stage architecture that combines speed with depth, a pattern now standard in large-scope engagements.

Passive Fingerprinting Identifying OS and services from observed network traffic without sending any probes — zero footprint on the target system. Implemented by tools like p0f and AI traffic classifiers.

JA3/JA4 TLS fingerprinting hashes derived from ClientHello parameters. Each application has a characteristic signature that AI can use to identify services even when traffic is encrypted.

--osscan-guess Nmap flag that forces OS detection output even when confidence is below Nmap's normal threshold — useful when AI will perform further disambiguation of ambiguous results.

Lesson 3 Quiz

Protocol Fingerprinting & OS Detection · 4 questions

What is the primary limitation of identifying services by port number alone?

Correct. Port-service mapping is a convention, not a technical constraint. Attackers and defenders both exploit this — SSH on 443, malware on 80, etc.

Services can run on any port they choose. Port numbers are conventions, not enforcement mechanisms. Review the beginning of Lesson 3.

What does a JA3 hash fingerprint, and what makes it useful for identifying services in encrypted traffic?

Correct. JA3 hashes SSL version, cipher suite list, extensions, and elliptic curves from the ClientHello — each application produces a characteristic hash even through encrypted channels.

JA3 fingerprints the TLS ClientHello handshake parameters, not the certificate or TCP layer. Review the JA3/JA4 callout in Lesson 3.

How did AI contribute to detecting the SUNBURST (SolarWinds) backdoor's DNS-based C2 communication?

Correct. SUNBURST traffic looked like normal DNS but had statistical anomalies in query timing, subdomain structure entropy, and TTL patterns that ML models could detect as behavioral outliers.

Review the Lesson 3 story scene. AI detected behavioral anomalies in DNS traffic patterns — timing, entropy, TTL — not by decrypting or blacklisting.

What two-stage scanning architecture does AI enable to combine Masscan's speed with Nmap's depth?

Correct. Masscan's speed (full IPv4 in ~6 minutes) feeds into targeted Nmap -sV probes that AI orchestrates against confirmed-open ports — combining coverage with depth.

Review the Masscan + AI callout. The pattern is: Masscan for breadth (open port discovery) then Nmap for depth (version/service detail) on those discovered ports.

Lab 3 — Protocol & OS Fingerprinting Advisor

Use AI to disambiguate OS detection and interpret protocol behavioral signals

Lab Objective

You have ambiguous OS detection results and a mix of encrypted/unusual traffic. Use the AI assistant to reason through OS identification from combined signals, interpret TLS fingerprinting data, and build a passive fingerprinting strategy.

Scenario: Nmap -O returns "OS: Linux 3.X | 4.X (96%)" but the host also has IIS 10.0 running on port 80. TTL is 128. SMB is open on 445.

Try: "Help me disambiguate this OS result" or "What JA3 analysis should I run on this host's HTTPS traffic?"

AI Lab Assistant

OS Fingerprinting

Module 2 · L3

Welcome to Lab 3. I specialize in OS fingerprinting disambiguation and protocol behavior analysis. Share your Nmap OS detection output, TTL values, running services, or TLS/JA3 data, and I'll help you build a confident OS and service identification using all available signals. What data do you have?

Module 2 · Lesson 4

Automated CVE Correlation & Attack Surface Mapping

From identified services to prioritized exploitation pathways — the AI intelligence layer

How do AI systems transform a list of services into a ranked, actionable vulnerability map?

When Log4Shell was disclosed on December 9, 2021, threat actors began mass-exploiting the vulnerability within 12 hours. Security teams at Cisco Talos and Microsoft MSTIC documented automated scanning campaigns that identified Log4j-dependent services by probing JNDI lookup responses across the internet. The key capability that separated rapid responders from victims was the ability to quickly map which internal services depended on Log4j — a problem solved by combining service identification with dependency graph analysis, exactly the kind of reasoning AI excels at.

The CVE Correlation Problem

After service identification produces a list of software names and versions, the next step is mapping each to known vulnerabilities. The National Vulnerability Database (NVD) contains over 250,000 CVE entries as of 2024. Manually cross-referencing even a modest service inventory against this database — then assessing which CVEs are exploitable given network context — is impractical at scale.

AI automates this in three stages: parsing (extracting CPE strings from banner data), querying (programmatic NVD API lookups or offline CVE corpus search), and reasoning (assessing exploitability given authentication requirements, network exposure, and available exploit code).

# NVD API v2 — query CVEs for a specific CPE
curl -H "apiKey: YOUR_KEY" \
  "https://services.nvd.nist.gov/rest/json/cves/2.0?\
cpeName=cpe:2.3:a:apache:http_server:2.4.29:*:*:*:*:*:*:*"

# Python wrapper for batch CVE lookup
import requests

def get_cves(cpe_string, api_key):
    url = "https://services.nvd.nist.gov/rest/json/cves/2.0"
    params = {"cpeName": cpe_string}
    headers = {"apiKey": api_key}
    r = requests.get(url, params=params, headers=headers)
    return r.json()["vulnerabilities"]

CVSS Scoring and AI Prioritization

Not all CVEs are equal. CVSS (Common Vulnerability Scoring System) scores vulnerabilities from 0–10 across three metric groups: Base (inherent properties), Temporal (current exploit availability), and Environmental (your specific network context). AI adds a critical capability: reasoning about the combination of CVSS scores, network exposure, and available exploit code to produce a prioritized attack path.

For example, CVE-2017-0144 (EternalBlue/MS17-010, SMB) has a CVSS Base of 9.8. But if the SMB service is on an isolated VLAN with no internet access, the actual risk is lower than a CVSS 7.5 vulnerability on a public-facing web server. AI contextualizes scores against the network architecture discovered during enumeration.

CVSS Base Metric Groups

Attack Vector — Network/Adjacent/Local/Physical
Attack Complexity — Low/High
Privileges Required — None/Low/High
User Interaction — None/Required
Scope — Unchanged/Changed
CIA Impact — None/Low/High each

AI Prioritization Factors

CVSS score vs. network exposure layer
Public exploit availability (Metasploit/ExploitDB)
Patch status probability from version string
Service dependency chains (Log4j pattern)
Authentication bypass vs. post-auth vuln
Active exploitation in the wild (CISA KEV)

CISA Known Exploited Vulnerabilities (KEV) Catalog

CISA maintains a KEV catalog of CVEs with confirmed active exploitation. As of 2024 it contains over 1,100 entries. AI-assisted service identification pipelines that cross-reference discovered CVEs against KEV produce high-confidence priority findings — a KEV-listed vulnerability in a discovered service is an immediate critical finding regardless of CVSS score.

Attack Surface Mapping with AI

The output of AI-assisted CVE correlation is an attack surface map — a structured representation of all discovered services, their associated vulnerabilities, and the attack paths that connect them to valuable targets. AI can generate these maps in natural language reports, structured JSON for tool ingestion, or visual graph formats.

Platforms like Tenable.io and Rapid7 InsightVM have integrated AI recommendation engines that do exactly this — their "Prioritized Risk" features use machine learning to rank findings by actual exploitation likelihood rather than raw CVSS, factoring in threat intelligence feeds and asset criticality. Tenable's research claims their AI-assisted prioritization reduces the remediation backlog by identifying the 3% of vulnerabilities responsible for 60% of actual breach risk.

# Example AI-generated attack surface summary structure
{
  "host": "10.10.10.50",
  "os": "Windows Server 2016",
  "services": [
    {
      "port": 445,
      "service": "SMB",
      "version": "3.1.1",
      "cves": [
        {"id": "CVE-2020-0796", "cvss": 10.0, "kev": true,
         "exploit": "public", "priority": "CRITICAL"}
      ]
    },
    {
      "port": 80,
      "service": "Apache",
      "version": "2.4.49",
      "cves": [
        {"id": "CVE-2021-41773", "cvss": 9.8, "kev": true,
         "exploit": "public", "priority": "CRITICAL"}
      ]
    }
  ],
  "attack_paths": ["CVE-2021-41773 -> RCE -> lateral movement via SMB"]
}

Exploit Chain Reasoning — AI's Unique Contribution

Perhaps the most significant AI contribution to attack surface mapping is exploit chain reasoning: identifying sequences of vulnerabilities that, when combined, produce a higher-impact attack than any single vulnerability alone. A CVSS 6.0 information disclosure + a CVSS 7.0 authentication bypass + a CVSS 5.5 privilege escalation can chain to full system compromise. Humans reason about chains intuitively; AI can systematically enumerate them from a service inventory.

CVSS Common Vulnerability Scoring System — a standardized 0–10 score reflecting vulnerability severity. AI re-weights raw CVSS with contextual factors (exposure, exploit availability, asset value) for realistic prioritization.

KEV Catalog CISA's Known Exploited Vulnerabilities list — CVEs with confirmed active exploitation in the wild. A KEV match in discovered services is an immediate critical finding in any AI-assisted pipeline.

Attack Surface Map A structured representation of all discovered assets, their services, associated vulnerabilities, and the attack paths connecting them to target objectives — the final output of AI-assisted service identification.

Lesson 4 Quiz

CVE Correlation & Attack Surface Mapping · 4 questions

What three stages does AI use to automate CVE correlation from banner-identified services?

Correct. The three stages are: CPE string extraction from banners, programmatic NVD query with those CPEs, then contextual reasoning about which CVEs are actually exploitable given network architecture.

Review the CVE Correlation section in Lesson 4. The three stages are parsing (CPE), querying (NVD), and reasoning (contextual exploitability).

Why might a CVSS 7.5 vulnerability be prioritized over a CVSS 9.8 vulnerability in an AI-assisted attack surface map?

Correct. AI contextualizes CVSS scores against actual network exposure. A high-score vulnerability on an isolated segment may be lower priority than a medium-score one on a publicly exposed, actively exploited service.

Review the CVSS Scoring and AI Prioritization section. Context — network exposure, exploit availability, asset criticality — can reverse raw CVSS priority rankings.

What is the CISA KEV catalog and why does it matter for AI-assisted service identification?

Correct. CISA's Known Exploited Vulnerabilities catalog (1,100+ entries) identifies CVEs being actively exploited. Any KEV match in an AI-assisted scan is automatically elevated to critical priority.

KEV = Known Exploited Vulnerabilities. It's CISA's list of CVEs with confirmed active exploitation. Review the KEV callout in Lesson 4.

What unique contribution does AI make to exploit chain reasoning that differentiates it from simple CVE listing?

Correct. Exploit chain reasoning identifies vulnerability sequences — e.g., CVSS 6.0 + 7.0 + 5.5 chaining to full system compromise — that no single vulnerability score reveals. This is a core AI differentiator over simple CVE lists.

Review the gold callout on exploit chain reasoning. AI's value is in identifying multi-step vulnerability chains that produce impacts greater than any single CVE suggests.

Lab 4 — CVE Correlation & Attack Surface Mapping

Build a prioritized attack surface map from identified services using AI reasoning

Lab Objective

You have completed service identification on a target environment. Use the AI assistant to correlate discovered services with CVEs, apply CVSS contextual prioritization, check against KEV criteria, and generate an attack surface map with exploit chain recommendations.

Target inventory: Apache 2.4.49 (port 80, public-facing) · SMBv1 enabled (port 445, internal LAN) · OpenSSH 7.2p2 (port 22) · MySQL 5.5.60 (port 3306, bound to 0.0.0.0)

Try: "Rank these services by exploitation priority" or "What exploit chains exist between Apache 2.4.49 and SMB?"

AI Lab Assistant

CVE Correlation

Module 2 · L4

Welcome to Lab 4. I'm your attack surface mapping assistant. Provide your service inventory — software names, versions, network exposure details — and I'll correlate CVEs, apply CVSS contextual scoring, cross-reference KEV, and reason about exploit chains. What services are we mapping today?

Module 2 — Module Test

AI-Assisted Service Identification · 15 questions · 80% to pass

1. Which Nmap flag enables service version detection by probing open ports with a library of version probes?

-sV is service version detection. -O is OS detection, -sC runs default scripts, -A combines all three plus traceroute.

-sV enables version detection. Review Lesson 1's Nmap section.

2. What information can NOT typically be extracted from a raw SSH banner like "SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u7"?

Correct. Banner strings reveal software and OS identity but not runtime state like active session counts. Session data requires authenticated access.

Active session counts are runtime state data, not included in static banner strings. Review Lesson 1's banner anatomy section.

3. The CVE-2011-2523 vsftpd 2.3.4 backdoor opens a shell on which port when triggered?

Correct. The vsftpd 2.3.4 backdoor opens a bind shell on TCP port 6200 when a username containing ":)" is submitted.

Port 6200 is the vsftpd 2.3.4 backdoor port. Review Lesson 2's vsftpd callout.

4. NSE scripts are written in which language?

Correct. NSE uses Lua, a lightweight embeddable scripting language chosen for its minimal footprint and C integration.

NSE scripts use Lua. Review Lesson 2.

5. What does Shodan's ML-based service classification (introduced 2022–2023) achieve that traditional banner indexing cannot?

Correct. Shodan's ML classification handles banner obfuscation by learning behavioral patterns, achieving 94%+ accuracy even on services that suppress version strings.

Shodan's ML classifies services despite banner obfuscation. Review the Lesson 1 gold callout.

6. In the PentestGPT study (NTU, 2023), how much better did AI-assisted testers perform versus unaided novices on HackTheBox tasks?

Correct. The PentestGPT paper (arXiv:2308.06782) found GPT-4 guided testers completed 228% more subtasks than unaided novices in controlled evaluations.

228% more subtasks. Review the PentestGPT callout in Lesson 2.

7. What is the default initial TTL for Windows systems, which can be used as an OS fingerprinting signal?

Correct. Windows typically uses TTL 128 while Linux uses 64 — a basic but useful OS fingerprinting signal when combined with other indicators.

Windows TTL = 128, Linux TTL = 64. Review Lesson 3's fingerprinting signals table.

8. What does p0f perform that distinguishes it from Nmap OS detection?

Correct. p0f is purely passive — it fingerprints OSes by analyzing traffic it observes without generating any packets toward the target, leaving zero footprint.

p0f is passive — no packets sent. Review the passive fingerprinting section in Lesson 3.

9. JA3 hashing creates a fingerprint based on which network event?

Correct. JA3 hashes the specific parameters of the TLS ClientHello — a client-initiated message in TLS setup — to produce per-application fingerprints.

JA3 uses TLS ClientHello parameters. Review Lesson 3's JA3/JA4 callout.

10. What anomalous signals in DNS traffic helped FireEye/Mandiant detect the SUNBURST backdoor's C2 communication?

Correct. SUNBURST used legitimate DNS infrastructure but had statistical anomalies — timing patterns, subdomain entropy, TTL behavior — that ML behavioral analysis could detect.

Review the SUNBURST story in Lesson 3. ML detected timing, entropy, and TTL anomalies in otherwise legitimate-looking DNS traffic.

11. What is a CPE string and what is its primary function in AI-assisted vulnerability correlation?

Correct. CPE strings (e.g., cpe:/a:openbsd:openssh:7.4) standardize software identity so AI can reliably query the NVD and other vulnerability databases programmatically.

CPE = Common Platform Enumeration. It bridges banner text to database queries. Review Lesson 1's key terms.

12. The 2021 Oldsmar water treatment attack leveraged which remote access tool, and what made the target discoverable via Shodan?

Correct. The attacker used TeamViewer. The Windows 7 RDP banner's "Windows 6.1" version string had been indexed by Shodan, making the vulnerable system publicly discoverable.

TeamViewer was the access tool; Shodan had indexed the Windows 7 RDP banner. Review Lesson 1's story section.

13. Log4Shell (CVE-2021-44228) began being actively exploited within what timeframe after public disclosure?

Correct. Mass exploitation of Log4Shell began within 12 hours of the December 9, 2021 disclosure — one of the fastest weaponization timelines for a critical vulnerability.

12 hours. Review Lesson 4's story section on Log4Shell exploitation speed.

14. Tenable's AI-assisted prioritization research claims that what percentage of vulnerabilities account for 60% of actual breach risk?

Correct. Tenable's research found that roughly 3% of vulnerabilities represent 60% of real breach risk — the core argument for AI-assisted prioritization over raw CVSS-based ranking.

3% of vulnerabilities drive 60% of breach risk per Tenable's research. Review Lesson 4's attack surface mapping section.

15. What is the CISA KEV catalog's significance when a discovered service matches an entry?

Correct. A KEV catalog match means confirmed active exploitation — real attackers are using it now. This elevates priority independent of CVSS score in any AI-assisted triage pipeline.

KEV = actively exploited in the wild = immediate critical priority. Review Lesson 4's CISA KEV callout.