Module 7 · Lesson 1

Broken Algorithms & Deprecated Primitives

Why AI code generators reach for MD5, DES, and RC4 — and what it costs when they do.

When an AI suggests a cryptographic function, how do you know whether it was already broken before you were born?

In 2012, the LinkedIn password database leak exposed 6.5 million SHA-1 unsalted hashes. Within 24 hours the majority were cracked using rainbow tables. In 2017 Google's Project Zero demonstrated a practical SHA-1 collision — the SHAttered attack — producing two different PDF files with identical SHA-1 digests, costing roughly $110,000 in cloud compute. By 2020, AI code assistants trained on pre-2010 Stack Overflow answers were still suggesting SHA-1 for password hashing and MD5 for file integrity in generated code snippets, because those snippets dominated the training corpus.

Why AI Systems Produce Deprecated Crypto

Large language models learn from code repositories and Q&A forums accumulated over decades. A question asked in 2006 — "How do I hash a password in Python?" — with an accepted answer demonstrating hashlib.md5() still sits in the training data with high engagement signals. The model has no internal calendar; it cannot natively distinguish "this was valid advice in 2006" from "this is valid advice today."

The practical result is that AI code generators frequently produce cryptographic code that was already considered weak or broken at the time of generation. Auditors need pattern recognition for these choices, not just abstract awareness that old algorithms exist.

Real Incident — Telegram 2013

Telegram's original MTProto protocol used a custom combination of AES-IGE mode and SHA-1 for message authentication. Security researchers identified that IGE mode does not provide ciphertext integrity and that SHA-1 was already weakened. The design was criticized in published cryptanalysis (Albrecht et al., 2012). Telegram rewrote MTProto2 using SHA-256 and HMAC constructions, but the original flaw arose from combining primitives that were individually familiar rather than collectively secure — precisely the pattern AI-generated crypto tends to reproduce.

The Broken Algorithm Taxonomy

Auditors should recognize three categories of deprecated cryptographic primitives that appear in AI-generated code with high frequency:

Primitive	Category	Known Break	Severity
MD5	Hash	Collision attacks since 2004 (Wang et al.); practical chosen-prefix collisions 2009 (Sotirov, Stevens)	Critical
SHA-1	Hash	Theoretical breaks 2005; SHAttered practical collision 2017 (Stevens et al., Google)	Critical
DES	Symmetric cipher	56-bit key exhausted by EFF Deep Crack 1998 in 56 hours; 3DES deprecated by NIST 2017	Critical
RC4	Stream cipher	Biases in first bytes known 1995; full prohibitions in TLS by RFC 7465 (2015)	Critical
RSA-512 / RSA-768	Asymmetric	RSA-512 factored 1999 (Cavallar et al.); RSA-768 factored 2009 (Kleinjung et al.)	Critical
ECB mode (any cipher)	Block cipher mode	Deterministic, reveals plaintext patterns; the "ECB penguin" demonstration is canonical	High
SHA-256 for passwords	Hash (misuse)	Not broken, but wrong tool: GPU cracking at billions of hashes/sec without work factor	High

Code Patterns to Detect in Review

The following patterns appear verbatim in AI-generated code. Each represents an auditable red flag:

# Pattern 1 — MD5 for any security purpose (Python)
import hashlib
digest = hashlib.md5(data).hexdigest()          # BROKEN: collision-vulnerable

# Pattern 2 — DES in Java
Cipher c = Cipher.getInstance("DES/CBC/PKCS5Padding");  # BROKEN: 56-bit key

# Pattern 3 — ECB mode in Python
cipher = AES.new(key, AES.MODE_ECB)             # BROKEN: no semantic security

# Pattern 4 — RC4 in Node.js
const cipher = crypto.createCipheriv('rc4', key, '');  # BROKEN: RFC 7465 banned

# Correct replacement — AES-GCM (authenticated encryption)
cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
ciphertext, tag = cipher.encrypt_and_digest(plaintext)
      

Auditor Note

When reviewing AI-generated code, search for the string literals "MD5", "SHA1", "SHA-1", "DES", "RC4", "ECB", and "RSA" combined with key-size parameters under 2048. These strings in imports, Cipher.getInstance() calls, or hashlib invocations are near-certain indicators of deprecated cryptography requiring remediation before deployment.

Key Terms

Collision attackFinding two distinct inputs that produce the same hash output. Practical collisions in MD5 enable forged digital signatures and certificate fraud.

Chosen-prefix collisionA stronger attack where an attacker can prepend arbitrary content to each of two documents before causing a collision — enabling real-world forgeries like the 2008 rogue CA certificate attack.

Semantic securityA ciphertext should reveal no information about the plaintext. ECB mode fails this because identical plaintext blocks encrypt to identical ciphertext blocks.

Work factorA parameter (iterations, memory cost) that makes a hash function deliberately slow, protecting passwords against GPU brute-force attacks. Absent in MD5/SHA-256; present in bcrypt, scrypt, Argon2.

Lesson 1 Quiz

Broken Algorithms & Deprecated Primitives · 4 questions

The 2017 SHAttered attack demonstrated a practical collision against SHA-1. What was the approximate computational cost reported by the Google/CWI research team?

Correct. The SHAttered paper (Stevens et al., 2017) estimated roughly $110,000 in Google Cloud compute — expensive but within reach of motivated nation-state actors or well-funded criminal groups, making SHA-1 practically broken for security purposes.

Not quite. The SHAttered paper estimated approximately $110,000 in Google Cloud compute — expensive but not prohibitive for motivated adversaries, which is why SHA-1 is considered practically broken.

Why does AES in ECB (Electronic Codebook) mode fail to provide semantic security?

Correct. ECB encrypts each block independently and deterministically. Two identical plaintext blocks always produce identical ciphertext blocks, leaking structural information. The canonical demonstration is the "ECB penguin" — a bitmap encrypted with ECB that still shows the recognizable penguin outline in the ciphertext.

Not correct. ECB's fundamental flaw is determinism: identical plaintext blocks always produce identical ciphertext blocks. This is the "ECB penguin" problem — structural patterns in plaintext survive into ciphertext, defeating the purpose of encryption.

An AI code assistant generates Python code using hashlib.sha256(password).hexdigest() for password storage. What is the primary security concern?

Correct. SHA-256 is cryptographically sound as a hash function but wrong for password storage because it has no work factor. Modern GPUs can compute billions of SHA-256 hashes per second, making dictionary and brute-force attacks trivially fast. Password hashing requires bcrypt, scrypt, or Argon2 — algorithms with tunable computational cost.

Not correct. SHA-256 itself is not broken — the problem is context. For password storage, a fast hash is the wrong tool. Without a work factor, GPUs can attempt billions of passwords per second. Bcrypt, scrypt, or Argon2 are appropriate because they are deliberately slow.

Which IETF RFC explicitly prohibited RC4 in TLS connections?

Correct. RFC 7465, published February 2015 and titled "Prohibiting RC4 Cipher Suites," explicitly mandated that TLS clients and servers MUST NOT use RC4 cipher suites during negotiation. Biases in RC4's keystream output had been documented since the 1990s and were practically exploited in attacks against WEP and TLS.

Not correct. RFC 7465 (2015), specifically titled "Prohibiting RC4 Cipher Suites," is the document that banned RC4 from TLS. Biases in RC4's keystream were documented as early as 1995 and exploited in WEP attacks and the BEAST/RC4NOMORE TLS attacks.

Lab 1: Detecting Deprecated Algorithms

AI Security Auditor · Cryptographic Review Simulation

Lab Scenario

You are reviewing AI-generated Python code for a healthcare data platform. The assistant has produced several modules that handle encryption and hashing. Your task is to identify every deprecated or misused cryptographic primitive, explain why each is dangerous, and recommend a secure replacement.

The AI lab assistant will present code snippets and answer your auditing questions. Complete at least 3 exchanges to finish the lab.

Start by asking the assistant to show you the first code snippet for review, or ask: "What deprecated crypto primitives should I look for in AI-generated healthcare data code?"

Crypto Audit Assistant

Lab 1

Welcome to the cryptographic code review lab. I'll present AI-generated code snippets from a healthcare data platform and help you identify cryptographic vulnerabilities. Ask me to show you code for review, or ask any question about deprecated algorithms and their secure replacements.

Module 7 · Lesson 2

Hardcoded Keys, IVs, and Secrets

The AI pattern of embedding cryptographic material directly in source code — and its catastrophic consequences.

If a secret key is committed to a Git repository for even one minute, should you consider it permanently compromised?

In January 2019, researcher Tanya Janca (SheHacksPurple) documented a pattern she had observed across hundreds of code reviews: AI-assisted code completion tools were inserting placeholder cryptographic keys that developers forgot to replace before deployment. The pattern appeared in AWS credential leaks tracked by the GitGuardian 2022 State of Secrets Sprawl report, which found over 10 million secrets exposed in public GitHub commits — a 50% year-over-year increase coinciding with the rise of AI code completion adoption. In 2023, GitGuardian's report identified secrets in 1 in every 10 public GitHub repositories.

How AI Generates Hardcoded Secrets

AI code assistants generate hardcoded cryptographic material through three distinct mechanisms:

1. Placeholder completion: When asked to write encryption code, the model generates a complete working example including a literal key — because training examples almost always include literal keys for demonstrability. The developer copies the snippet and the placeholder key survives into production.

2. Test-to-production bleeding: AI-generated test fixtures include hardcoded test credentials. When production code is scaffolded from the same pattern, the test values persist.

3. IV/nonce reuse: AI generators frequently produce a hardcoded initialization vector (IV) of all-zeros or a literal byte string — an error that in many cipher modes (especially CTR) reduces the encryption to trivially recoverable plaintext given two messages encrypted with the same key.

# DANGEROUS: AI-generated pattern — hardcoded key and static IV
SECRET_KEY = b'mysecretkey12345'                    # hardcoded 16-byte key
IV = b'\x00' * 16                                   # static all-zero IV

# In CTR mode, reusing key+nonce pair is catastrophic:
# C1 = P1 XOR keystream, C2 = P2 XOR keystream
# → C1 XOR C2 = P1 XOR P2 (keystream cancels out)
cipher = AES.new(SECRET_KEY, AES.MODE_CTR, nonce=IV[:8])

# CORRECT: Keys from environment, nonces cryptographically random
import os
SECRET_KEY = os.environ['ENCRYPTION_KEY'].encode()   # from secrets manager
nonce = os.urandom(8)                                # unique per message
      

Real Incident — Uber 2016

In 2016, Uber suffered a breach affecting 57 million riders and drivers. The root cause: AWS access credentials were committed to a private GitHub repository in plaintext. Attackers — scanning for hardcoded secrets, a fully automated process — found the credentials, authenticated to AWS, and downloaded the data. The incident cost Uber $148 million in a 2018 settlement with US states. While not AI-generated, the pattern is identical to what AI code assistants now reproduce automatically.

The Nonce Reuse Catastrophe

Nonce reuse in stream cipher modes (CTR, GCM) is particularly dangerous because it is mathematically catastrophic rather than merely weakening. In CTR mode, the keystream is generated as E(key, nonce || counter). If the same nonce is reused with the same key, both messages are XORed with the identical keystream. An attacker who observes both ciphertexts can compute their XOR, obtaining the XOR of the two plaintexts — which is trivially recoverable using frequency analysis or known-plaintext attacks.

In AES-GCM, nonce reuse is even worse: it allows an attacker to recover the authentication key, defeating both confidentiality and integrity simultaneously. This is not theoretical — the Forbidden Attack (Joux, 2006, formalized by Handschuh and Preneel) demonstrates GCM nonce reuse as a practical key recovery.

Real Incident — PS3 ECDSA 2010

Sony's PlayStation 3 used ECDSA signatures to verify game code. The implementation reused the same random nonce k for every signature. Fail0verflow demonstrated at CCC 2010 that when nonce k is repeated across two signatures (r, s₁) and (r, s₂), the private key d can be algebraically recovered: d = (s₁ - s₂)⁻¹ × (z₁ - z₂) mod n. This allowed arbitrary code signing on the PS3 — catastrophic firmware-level compromise from a single cryptographic implementation error. AI-generated ECDSA code today frequently omits deterministic nonce generation (RFC 6979), reproducing the same flaw.

Detection Checklist for Auditors

When reviewing AI-generated cryptographic code, apply this checklist:

Key source checkAre keys loaded from environment variables, secrets managers (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault), or hardware security modules? Any literal string used as a key is a finding.

IV/nonce generation checkIs each nonce generated using a cryptographically secure random source (os.urandom(), crypto.randomBytes(), SecureRandom) per message? Hardcoded, sequential, or time-based nonces are findings.

ECDSA nonce checkDoes ECDSA implementation use deterministic k generation per RFC 6979, or does it rely on the RNG? RNG-based nonce generation in ECDSA is a finding without additional safeguards.

Secret scanning integrationIs the repository configured with pre-commit hooks or CI/CD secret scanning (GitGuardian, Gitleaks, GitHub secret scanning)? Absence of scanning is an organizational finding.

The Two-Time Pad Rule

Any stream cipher (including AES-CTR and AES-GCM) used to encrypt two messages with the same key and nonce becomes a two-time pad. The security collapses completely — not gradually, but immediately. Auditors must treat any hardcoded or static nonce as a critical finding regardless of the surrounding code's quality.

Lesson 2 Quiz

Hardcoded Keys, IVs, and Secrets · 4 questions

In AES-CTR mode, if the same key and nonce pair is reused for two different messages C1 and C2, what can an attacker compute?

Correct. When key+nonce are reused, both messages are XORed with identical keystreams. An attacker computes C1 XOR C2 = P1 XOR P2 (keystreams cancel). The XOR of two English or structured plaintexts is recoverable through frequency analysis — this is the "two-time pad" attack, identical to breaking a Vernam cipher used twice.

Not correct. In CTR mode nonce reuse, C1 XOR C2 = (P1 XOR KS) XOR (P2 XOR KS) = P1 XOR P2, because the keystream KS cancels out. This gives an attacker the XOR of the two plaintexts, which is practically recoverable — this is the two-time pad attack.

The PlayStation 3 ECDSA private key recovery exploit (demonstrated at CCC 2010) succeeded because of which implementation flaw?

Correct. Sony's PS3 ECDSA implementation used a constant nonce k for every signature. When k repeats, two signatures (r, s₁, z₁) and (r, s₂, z₂) sharing the same r value allow solving for the private key algebraically. Fail0verflow published the attack at 27C3 (CCC 2010), enabling the homebrew community to sign arbitrary code for the PS3.

Not correct. The PS3 ECDSA break came from nonce (k) reuse. With two signatures sharing the same k (and therefore the same r), the private key d can be solved algebraically from the signature equations. Fail0verflow demonstrated this at CCC 2010. RFC 6979 was later published to define deterministic nonce generation that prevents this class of attack.

According to GitGuardian's 2022 State of Secrets Sprawl report, approximately how many secrets were exposed in public GitHub commits that year?

Correct. The GitGuardian 2022 report found over 10 million secrets exposed in public GitHub commits — a 50% year-over-year increase. The report correlated this growth with expanded use of AI code completion tools that generate working code with literal credential values as placeholder examples.

Not correct. GitGuardian's 2022 report found over 10 million secrets exposed in public GitHub commits — a 50% increase year-over-year, a trend the report associated with growing AI code completion adoption generating literal credential placeholders in working code.

In AES-GCM, nonce reuse has an additional consequence beyond plaintext recovery. What is it?

Correct. In GCM, the GHASH authentication key H = E(key, 0) is derived once and used for all authentication. When nonces are reused, an attacker can recover H from the authentication tags (the "Forbidden Attack," formalized by Joux 2006). With H recovered, the attacker can forge valid authentication tags for arbitrary ciphertexts — breaking integrity completely in addition to confidentiality.

Not correct. GCM nonce reuse enables the Forbidden Attack: the GHASH key H = E(key, 0) can be recovered algebraically from two ciphertext/tag pairs sharing a nonce. With H in hand, an attacker can compute valid authentication tags for any ciphertext, completely defeating integrity — not just revealing plaintext.

Lab 2: Hunting Hardcoded Secrets

AI Security Auditor · Secrets Detection Simulation

Lab Scenario

You are auditing a fintech startup's codebase that was primarily written using AI code assistants. The engineering team suspects hardcoded secrets and improper IV/nonce handling may exist but has not conducted a formal review. Your task is to interrogate the AI assistant about specific files and patterns, identify all hardcoded secret findings, and recommend remediation.

Complete at least 3 meaningful exchanges to finish this lab.

Try asking: "Show me the encryption module from the payment service" or "What are the three most common ways AI tools introduce hardcoded secrets?"

Secrets Audit Assistant

Lab 2

I'm your secrets detection assistant for this fintech codebase audit. I can show you code samples, explain hardcoded secret vulnerabilities, walk through nonce reuse scenarios, and help you build a remediation plan. What would you like to examine first?

Module 7 · Lesson 3

TLS Misconfiguration & Certificate Validation Failures

AI-generated TLS code that disables verification, accepts expired certificates, and enables downgrade attacks.

When an AI generates code that disables certificate verification "to fix an SSL error," what attack surface has it just opened?

In 2014, Apple shipped iOS 7 with a bug in SecureTransport — the notorious "goto fail" vulnerability (CVE-2014-1266). A duplicated goto fail; statement caused the signature verification function to always return success, regardless of whether certificate validation passed. This meant any TLS connection appeared valid regardless of certificate authenticity. The bug was present for months in shipping iOS and OS X builds. While not AI-generated, the pattern — a single line change defeating all TLS trust — is precisely what AI code generators reproduce when they suggest verify=False or similar certificate-disabling patterns to resolve SSL errors.

The Most Dangerous Line in AI-Generated Python

When developers encounter SSL certificate errors during development, a common Stack Overflow resolution — heavily represented in training data — is to pass verify=False to the requests library. AI code generators reproduce this suggestion frequently because it appears as a working solution to a common error message. The developer copies it, the SSL error disappears, and the code ships to production with TLS verification permanently disabled.

# CRITICAL: AI-generated "fix" for SSL errors — eliminates all TLS security
import requests
response = requests.get('https://api.example.com/data', verify=False)
# urllib3 will print a warning, but the request succeeds with ANY certificate
# An attacker performing MITM presents their own cert — validation skipped

# Also dangerous — disabling hostname verification in Java
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setHostnameVerifier((hostname, session) -> true);  // NEVER do this

# Correct approach — fix the actual certificate issue
import requests
response = requests.get('https://api.example.com/data',
    verify='/path/to/ca-bundle.crt')  # or fix the cert chain
      

Real Incident — Equifax 2017 (Contributing Factor)

The Equifax breach (September 2017, 147 million records) involved multiple failures. One contributing factor identified in Congressional testimony and the post-incident GAO report was the failure to inspect encrypted traffic at a TLS inspection point that had an expired certificate — the monitoring device had stopped inspecting traffic silently when its own certificate expired. The breach persisted for 76 days partly because encrypted internal traffic was not being inspected. TLS configuration errors — including disabled verification and expired certificates — created blind spots that allowed exfiltration to go undetected.

Downgrade Attacks and Protocol Version Pinning

AI-generated server configuration code commonly omits explicit minimum TLS version enforcement, defaulting to library defaults that often include TLS 1.0 and 1.1 for compatibility. This enables protocol downgrade attacks.

POODLE (CVE-2014-3566): Bodo Möller, Thai Duong, and Krzysztof Kotowicz at Google demonstrated in 2014 that an attacker controlling the network could force a TLS session to downgrade to SSL 3.0, then exploit a padding oracle in CBC mode to decrypt cookies one byte at a time. The attack requires roughly 256 requests per byte — practical for session cookie recovery.

BEAST (CVE-2011-3389): Demonstrated by Duong and Rizzo at Ekoparty 2011, BEAST exploited predictable IVs in TLS 1.0's CBC implementation to perform a chosen-plaintext attack against browser cookies, recovering CSRF tokens and session identifiers.

# DANGEROUS: AI-generated Flask/SSL configuration without version pinning
app.run(ssl_context='adhoc')               # accepts any TLS version including 1.0

# DANGEROUS: Python ssl module — default context may include TLS 1.0
context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)  # deprecated; no minimum version

# CORRECT: Explicit TLS 1.2 minimum, strong cipher suites
context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
context.minimum_version = ssl.TLSVersion.TLSv1_2
context.set_ciphers('ECDHE+AESGCM:ECDHE+CHACHA20:!aNULL:!MD5:!DSS')
      

Certificate Pinning — When AI Gets It Wrong

Mobile AI-generated code often includes certificate pinning implementations with subtle flaws: pinning only the leaf certificate (breaking on certificate renewal), implementing pinning but bypassing it on exceptions, or pinning to a self-signed certificate with no rotation plan. The 2015 Superfish adware incident installed a self-signed root certificate on Lenovo laptops, enabling MITM against all HTTPS traffic — functionally identical to what happens when applications bypass standard certificate validation.

AI-Generated Pattern	Attack Enabled	Severity
verify=False / setHostnameVerifier(→true)	Full MITM — any certificate accepted	Critical
No minimum TLS version	POODLE, BEAST downgrade attacks	High
Weak cipher suites allowed (RC4, DES in TLS)	SWEET32, CRIME, RC4 biases	High
Certificate pinning on leaf cert only	Pin bypass on cert renewal; service disruption	Medium
Self-signed cert accepted in production	MITM; no chain of trust	Critical

MITM (Man-in-the-Middle)An attack where an adversary intercepts and potentially modifies communications between two parties who believe they are communicating directly. Disabled certificate verification makes MITM trivially possible.

Certificate pinningA mechanism where an application hardcodes the expected certificate or public key, refusing connections if the presented certificate does not match — even if it is signed by a trusted CA.

Padding oracleA side channel that allows an attacker to determine whether decrypted padding is valid, enabling byte-by-byte plaintext recovery. Exploited by POODLE against SSL 3.0 CBC mode.

Lesson 3 Quiz

TLS Misconfiguration & Certificate Validation · 4 questions

The POODLE attack (CVE-2014-3566) forced sessions to downgrade to SSL 3.0. What property of SSL 3.0 CBC mode did it then exploit?

Correct. POODLE exploited a padding oracle in SSL 3.0's CBC mode. SSL 3.0's padding scheme is not fully specified, allowing the decryptor to accept various padding patterns. An attacker controlling the network could manipulate padding bytes, use the receiver's success/failure response as an oracle, and recover plaintext one byte at a time — requiring approximately 256 requests per byte.

Not correct. POODLE exploited a padding oracle in SSL 3.0 CBC mode. SSL 3.0's underspecified padding allowed attackers to manipulate individual padding bytes and use the server's accept/reject response as an oracle, recovering plaintext byte-by-byte. The key prerequisite was forcing a downgrade from TLS to SSL 3.0 — which is why disabling SSL 3.0 entirely is the fix.

Apple's "goto fail" bug (CVE-2014-1266) was caused by which specific code pattern?

Correct. The SecureTransport source contained two consecutive "goto fail;" lines. The second, unconditional goto executed regardless of the verification result, skipping all subsequent signature checks and returning errSecSuccess for any certificate. The bug existed in iOS 6, iOS 7, and OS X 10.9 before being patched in February 2014.

Not correct. "goto fail" was caused by a duplicated line: two consecutive "goto fail;" statements. The second was unconditional — it executed regardless of previous check results, causing the function to return success (errSecSuccess) without completing signature verification. Any certificate was accepted as valid during the affected period.

When auditing AI-generated Python code using the requests library, which parameter indicates TLS certificate verification has been disabled?

Correct. verify=False disables all TLS certificate verification in Python's requests library — the server's certificate is accepted regardless of whether it is signed by a trusted CA, whether the hostname matches, or whether it has expired. This is a critical finding in any production code. The correct fix is to resolve the underlying certificate issue, not to disable verification.

Not correct. The parameter verify=False is the critical finding — it disables all TLS certificate verification in the requests library, making the connection vulnerable to MITM attacks. Any certificate from any attacker is accepted. This is one of the most common security-defeating "fixes" suggested by AI code assistants.

The 2015 Superfish adware pre-installed on Lenovo laptops demonstrated what class of TLS attack?

Correct. Superfish installed a self-signed root certificate authority into Windows' trust store and ran a local MITM proxy. Because the rogue CA was trusted system-wide, Superfish could intercept any HTTPS connection and replace the server's certificate with one signed by its own CA — which browsers accepted as valid. The private key for this CA was the same across all Lenovo machines and was extracted within hours of public disclosure in February 2015.

Not correct. Superfish installed a rogue root certificate authority into the Windows trust store, enabling a local MITM proxy to intercept and decrypt all HTTPS traffic on affected Lenovo machines. Because the rogue CA was trusted, browsers accepted its certificates as valid. The CA's private key was the same on all machines and was extracted from the software within hours of the February 2015 disclosure.

Lab 3: TLS Configuration Review

AI Security Auditor · TLS & Certificate Validation Simulation

Lab Scenario

A development team used an AI assistant to configure TLS for their new API gateway, a mobile application's network layer, and an internal microservice mesh. You suspect the AI introduced certificate verification bypasses, weak protocol versions, and misconfigured cipher suites. Your job is to audit the configurations, identify each vulnerability, and prescribe specific remediation steps.

Complete at least 3 meaningful exchanges to finish this lab.

Start with: "Show me the API gateway's TLS configuration" or "What are the most dangerous TLS patterns AI assistants introduce?"

TLS Audit Assistant

Lab 3

I'm your TLS configuration audit assistant. I can show you the AI-generated TLS configurations for the API gateway, mobile client, and microservice mesh, explain the vulnerabilities in each, and walk through remediation. Which component would you like to review first?

Module 7 · Lesson 4

Random Number Generation & Key Derivation Failures

When AI uses Math.random() for cryptography and derives keys from passwords without proper salting.

How does a predictable random number generator in a cryptographic context turn a theoretical vulnerability into a practical key recovery?

In 2012, researchers Lenstra, Hughes, Kleinjung, Lange, Simmons, and Stern conducted a large-scale analysis of RSA public keys collected from the internet. They found that 0.2% of RSA keys — over 27,000 keys — shared a prime factor with at least one other key. This meant those RSA keys could be factored using a simple GCD computation in milliseconds, recovering the private key entirely. The root cause was insufficient entropy during key generation on embedded devices — routers, firewalls, and VPN concentrators that generated keys during early boot before sufficient entropy had accumulated. Weak random number generation collapsed the security of approximately 5,000 distinct hosts to effectively zero.

Non-Cryptographic Randomness in Crypto Contexts

AI code generators conflate general-purpose random number functions with cryptographically secure random number generators (CSPRNGs) because both categories appear in the same training contexts. The critical difference:

General-purpose PRNGs (Math.random() in JavaScript, random.random() in Python, java.util.Random) use algorithms like linear congruential generators or Mersenne Twister. These are fast, statistically uniform, and completely predictable if the seed or any output is observed.

CSPRNGs (window.crypto.getRandomValues() in JS, os.urandom() or secrets module in Python, java.security.SecureRandom) draw entropy from the operating system and are computationally infeasible to predict even with knowledge of previous outputs.

// CRITICAL: AI uses Math.random() for token generation (JavaScript)
function generateToken() {
  let token = '';
  for(let i=0; i<32; i++) {
    token += Math.floor(Math.random() * 16).toString(16);
  }
  return token;  // Predictable — Math.random() is NOT a CSPRNG
}

// CORRECT: Web Crypto API — cryptographically secure
function generateToken() {
  const bytes = new Uint8Array(32);
  window.crypto.getRandomValues(bytes);
  return Array.from(bytes, b => b.toString(16).padStart(2,'0')).join('');
}

# Python: AI uses random module for session IDs
import random
session_id = ''.join(random.choices('abcdef0123456789', k=32))  # WRONG

# CORRECT: secrets module
import secrets
session_id = secrets.token_hex(32)
      

Real Incident — Debian OpenSSL 2006–2008

In 2006, a Debian maintainer removed two lines from the OpenSSL PRNG seeding code during a valgrind audit, believing them to be uninitialized memory accesses. Those two lines were the primary source of entropy. For two years, Debian and Ubuntu systems generated all SSL/SSH keys using a PRNG seeded only with the process ID — a 15-bit value with at most 32,768 possible seeds. When the bug was discovered in May 2008 (CVE-2008-0166), all keys generated on affected systems were compromised. Pre-generated tables of all possible keys were published within hours. Millions of deployed SSH and SSL keys required emergency replacement.

Key Derivation Function Failures

AI-generated code frequently makes critical errors in key derivation — the process of converting a password or passphrase into a cryptographic key. The two most common patterns:

Direct hashing without KDF: Using SHA-256(password) as a key skips the work factor and salt, enabling offline dictionary attacks at GPU speed (billions of attempts per second). This is distinct from but related to the password storage problem — here the attacker recovers the encryption key rather than verifying a password.

PBKDF2 with insufficient iterations: PBKDF2 is a valid KDF, but AI generators frequently use the default iteration count from documentation examples — often 1,000 iterations — rather than the NIST-recommended 600,000 iterations for HMAC-SHA256 (NIST SP 800-132, 2023 update). At 1,000 iterations, PBKDF2 provides minimal protection over direct hashing.

# DANGEROUS: Direct SHA-256 as encryption key (AI pattern)
import hashlib
key = hashlib.sha256(password.encode()).digest()   # no salt, no work factor

# DANGEROUS: PBKDF2 with 1000 iterations (AI uses doc example count)
key = hashlib.pbkdf2_hmac('sha256', password.encode(), salt, 1000)

# CORRECT: Argon2id (NIST recommended 2023) or PBKDF2 at adequate iterations
from argon2.low_level import hash_secret_raw, Type
key = hash_secret_raw(
    password.encode(), salt,
    time_cost=3, memory_cost=65536,     # 64MB RAM, 3 iterations
    parallelism=4, hash_len=32,
    type=Type.ID
)
# Or PBKDF2 at NIST SP 800-132 (2023) recommended count:
key = hashlib.pbkdf2_hmac('sha256', password.encode(), salt, 600000)
      

Real Incident — Lavabit 2013 / Encrypted Email Key Derivation

When Lavabit (Edward Snowden's email provider) was ordered to produce SSL private keys by the US government in 2013, founder Ladar Levison complied by providing the 2,560-character key printed in a 4-point font on paper — arguably compliant but practically unusable. The incident highlighted the importance of per-user key derivation: if each user's data is encrypted with a key derived from their password using a proper KDF, even disclosure of server-side keys does not compromise content if passwords remain unknown. Systems using server-side keys derived without per-user salts collapse this model, making the server the single point of trust and failure.

Entropy Starvation in Containerized Environments

A modern variant of the Debian PRNG bug: AI-generated Docker containers and cloud functions that generate cryptographic keys at startup may encounter entropy starvation — insufficient /dev/random entropy in the container's entropy pool during early initialization. The solution is not to fall back to /dev/urandom (acceptable on modern Linux with getrandom() which blocks until initialized) but to ensure the container environment has access to a hardware RNG source (virtio-rng on VM hosts, or the host's /dev/hwrng via device mapping). AI-generated Kubernetes configurations rarely include these provisions.

Pattern	Vulnerability	Severity
Math.random() / random.random() for tokens	Predictable tokens; session hijacking if seed observable	Critical
sha256(password) as encryption key	No work factor; offline brute-force at GPU speed	Critical
PBKDF2 with ≤10,000 iterations	Inadequate work factor; vulnerable to GPU attacks	High
No salt in password hashing	Rainbow table attacks; identical passwords share hashes	Critical
Key generation at container startup without entropy check	Entropy starvation; predictable keys (Debian pattern)	High

CSPRNGCryptographically Secure Pseudo-Random Number Generator. An RNG whose output is computationally indistinguishable from true randomness — specifically, knowing previous outputs does not allow predicting future outputs.

KDF (Key Derivation Function)A function that derives one or more cryptographic keys from a master key, password, or secret. Proper KDFs include a salt (to prevent rainbow tables) and a work factor (to slow brute-force attacks).

Argon2idThe winner of the Password Hashing Competition (2015). Argon2id combines memory-hardness (resisting GPU/ASIC attacks) with resistance to side-channel attacks. NIST SP 800-63B (2017) recommends it for password hashing.

Entropy starvationA condition where a system's random number pool contains insufficient unpredictable data at the time cryptographic material is generated, resulting in predictable or weakly random keys.

Lesson 4 Quiz

Random Number Generation & Key Derivation · 4 questions

The 2008 Debian OpenSSL vulnerability (CVE-2008-0166) reduced RSA key generation to how many possible seeds?

Correct. With the entropy seeding lines removed, the OpenSSL PRNG was seeded only from the process ID — a value of at most 32,767 on Linux (15 bits, PID_MAX_DEFAULT). All possible SSL and SSH keys generated on Debian/Ubuntu systems during the two-year vulnerable period could be pre-computed and stored in a lookup table. The table was published within hours of the May 2008 disclosure.

Not correct. The Debian OpenSSL bug left only the process ID (PID) as the PRNG seed — at most 32,767 distinct values on Linux (15-bit PID space). This meant all possible keys could be pre-computed. Within hours of public disclosure in May 2008, complete lookup tables were published and all affected keys considered compromised.

The 2012 Lenstra et al. study found that ~0.2% of internet RSA keys shared a prime factor with another key. What was the root cause?

Correct. Embedded devices (routers, firewalls, VPN appliances) generated RSA keys during early boot — often the first boot after flashing — before the kernel entropy pool had accumulated sufficient randomness. Multiple devices with insufficient entropy could independently generate keys sharing a prime factor. A simple GCD computation against all collected public keys revealed these shared primes, allowing private key recovery in milliseconds.

Not correct. The Lenstra et al. study found the cause was entropy starvation: embedded devices generated RSA keys during early boot before the OS entropy pool was seeded with sufficient randomness. Devices with identical or near-identical entropy states could generate keys sharing a prime. Finding shared primes via batch GCD allowed trivial private key recovery.

According to NIST SP 800-132 (2023 update), what is the recommended minimum iteration count for PBKDF2-HMAC-SHA256 used for key derivation from passwords?

Correct. NIST SP 800-132 was updated to recommend 600,000 iterations for PBKDF2-HMAC-SHA256 as of 2023. AI code generators frequently use 1,000 iterations (from old documentation examples) or 10,000 iterations — both far below this threshold and providing negligible protection against GPU-accelerated dictionary attacks. Any PBKDF2 usage under 100,000 iterations should be flagged as a finding.

Not correct. NIST SP 800-132 (2023) recommends 600,000 iterations for PBKDF2-HMAC-SHA256. AI generators commonly use 1,000 iterations from old documentation examples — providing only 0.17% of the recommended work. This is a consistently present finding in AI-generated key derivation code.

What is the correct JavaScript API for generating cryptographically secure random bytes in a browser environment?

Correct. window.crypto.getRandomValues() is the Web Crypto API's CSPRNG, drawing entropy from the browser's interface to the OS entropy source. It is available in all modern browsers and in Node.js via the globalThis.crypto object. Math.random() is a linear congruential PRNG — predictable given any observed output — and must never be used for security tokens, session IDs, nonces, or key material.

Not correct. The correct browser CSPRNG API is window.crypto.getRandomValues(new Uint8Array(n)). Math.random() is a linear congruential generator seeded deterministically — given any output, all past and future outputs are recoverable. Hashing Math.random() output does not add cryptographic security because the seed space is small and predictable.

Lab 4: RNG & Key Derivation Audit

AI Security Auditor · Randomness & KDF Review Simulation

Lab Scenario

A cloud-native SaaS application was scaffolded almost entirely by an AI code assistant. You need to audit its random number generation across all layers (frontend JavaScript, backend Python, infrastructure Terraform) and its key derivation logic for user data encryption. Identify every CSPRNG failure, every KDF misconfiguration, and every missing salt — then prescribe specific fixes.

Complete at least 3 meaningful exchanges to finish this lab.

Start with: "Show me the session token generation code" or "Audit the encryption key derivation from user passwords."

RNG & KDF Audit Assistant

Lab 4

Welcome to the randomness and key derivation audit lab. I can show you the AI-generated code for session token generation, encryption key derivation, password hashing, and infrastructure secrets — across the JavaScript frontend, Python backend, and Terraform configurations. What would you like to audit first?

Module 7 Test

Cryptography Mistakes · 15 questions · 80% to pass

1. Which collision attack against MD5 was published by Wang et al. in 2004, establishing MD5 as cryptographically broken for security-critical uses?

Correct. Wang et al.'s 2004 paper demonstrated practical collision finding using differential cryptanalysis — finding two different inputs producing the same MD5 hash in under one hour on commodity hardware. This established MD5 as broken for any collision-resistance-dependent application.

Not correct. Wang et al. (2004) demonstrated differential cryptanalysis producing MD5 collisions in under one hour on a standard PC. This was the critical break establishing MD5 as unusable for any purpose requiring collision resistance, including digital signatures and certificate integrity.

2. The EFF's Deep Crack machine demonstrated the practical vulnerability of DES in 1998. What was its key-recovery time?

Correct. The EFF's Deep Crack, built for approximately $250,000, exhausted the 56-bit DES keyspace in 56 hours in 1998's DES Challenges. This definitively proved DES's 56-bit key was computationally insufficient for any security application. NIST formally deprecated 3DES in 2017 (NIST SP 800-131A Rev. 2).

Not correct. Deep Crack required 56 hours to exhaust the DES keyspace using purpose-built ASIC hardware costing ~$250,000. This definitively ended DES's viability. Modern commodity hardware can crack DES in seconds, and NIST formally deprecated even 3DES in 2017.

3. An AI code assistant generates cipher = AES.new(key, AES.MODE_ECB). The primary security failure is that ECB mode:

Correct. ECB is deterministic and stateless — each 16-byte block is encrypted independently. Identical plaintext blocks always produce identical ciphertext blocks. The structural information in the data survives into ciphertext, which the "ECB penguin" demonstration shows visually: a bitmap encrypted with ECB still shows the recognizable image outline.

Not correct. ECB's fundamental flaw is determinism: identical plaintext blocks → identical ciphertext blocks. This reveals data structure (the "ECB penguin" problem). ECB provides no semantic security — an attacker can identify repeated blocks and infer plaintext patterns.

4. In AES-GCM, repeating a nonce with the same key allows an attacker to execute the Forbidden Attack. What does this recover?

Correct. GCM's authentication uses GHASH with key H = E(key, 0). When nonces repeat, two authentication equations over the same H can be solved for H. With H recovered, forging valid authentication tags for arbitrary ciphertexts becomes trivial — breaking both confidentiality (via keystream recovery) and integrity (via tag forgery) simultaneously.

Not correct. The Forbidden Attack recovers the GHASH key H = E(key, 0) from two messages sharing a nonce. With H, an attacker can forge valid authentication tags for any ciphertext, completely breaking GCM's integrity guarantee — not just recovering plaintext.

5. The 2016 Uber breach resulted in a $148 million settlement. What was the root cryptographic/security cause?

Correct. Uber's 2016 breach occurred because AWS credentials were committed to a private GitHub repository in plaintext. Automated scanners discovered them, authenticated to AWS, and exfiltrated data on 57 million users. The company paid $148 million in a 2018 settlement and its CSO was later convicted of obstruction of justice for attempting to conceal the breach.

Not correct. The Uber 2016 breach: AWS credentials were committed to GitHub in plaintext, found by automated scanners, used to authenticate to AWS and download 57 million user records. The $148M settlement came in 2018. This is the canonical hardcoded credentials incident that AI code generation patterns reproduce.

6. RFC 7465 (2015) prohibited RC4 in TLS because of which fundamental property?

Correct. RC4's keystream has known statistical biases — particularly in the first bytes (Fluhrer, Mantin, Shamir, 2001) and cumulative biases across long keystreams. Given sufficient TLS sessions encrypted with the same key material (as in HTTPS cookies), these biases allow byte-by-byte recovery of repeated plaintext, demonstrated in the RC4NOMORE attack (Vanhoef & Piessens, 2015) requiring approximately 75 hours of network observation.

Not correct. RFC 7465 prohibited RC4 because of statistical biases in its keystream. The RC4NOMORE attack (Vanhoef & Piessens, 2015) showed these biases allow byte-by-byte plaintext recovery given sufficient HTTPS sessions — approximately 75 hours of traffic capture for cookie recovery.

7. An auditor finds requests.get(url, verify=False) in production Python code. Which attack does this directly enable?

Correct. verify=False disables all TLS certificate validation in the Python requests library. An attacker performing a MITM attack can present any certificate — expired, self-signed, for the wrong hostname, or from a rogue CA — and the client will accept it and continue the connection, delivering all traffic to the attacker.

Not correct. verify=False in requests disables TLS certificate validation entirely. A MITM attacker can present any certificate and the connection proceeds normally, delivering all plaintext to the attacker. This is one of the most impactful single-parameter security failures possible in Python network code.

8. The BEAST attack (2011) against TLS 1.0 exploited which specific property of TLS 1.0's CBC implementation?

Correct. TLS 1.0 used the last ciphertext block of one record as the IV for the next record. Duong and Rizzo demonstrated at Ekoparty 2011 that this predictable IV allows a chosen-plaintext attack: by controlling some plaintext (via script injection) and observing ciphertext, an attacker can recover browser cookies byte-by-byte. TLS 1.1+ fixed this by using random IVs.

Not correct. BEAST exploited TLS 1.0's CBC chained IV — the last ciphertext block of one record became the IV of the next. This predictability allowed Duong and Rizzo to mount a chosen-plaintext attack recovering browser session cookies. TLS 1.1 fixed this with random per-record IVs.

9. The PS3 ECDSA private key was recovered because Sony reused nonce k. Which arithmetic relationship makes this recovery possible?

Correct. In ECDSA, r = (k·G)_x mod n. If k is reused, r is the same in both signatures. With two equations s₁ = k⁻¹(z₁ + r·d) mod n and s₂ = k⁻¹(z₂ + r·d) mod n sharing k and d, subtracting eliminates k and leaves a linear equation solvable for d. Fail0verflow demonstrated this at CCC 2010.

Not correct. When k is reused, r = (k·G)_x is the same in both signatures. Two equations with shared k and r give a solvable linear system for the private key d. This is pure algebra — no complex cryptanalysis required once nonce reuse is established.

10. Argon2id is preferred over bcrypt for new password hashing implementations because Argon2id:

Correct. Argon2id is memory-hard — it requires a configurable amount of RAM (e.g., 64MB) per hash computation. GPU and ASIC hardware can parallelize CPU-bound operations cheaply, but memory bandwidth is a genuine hardware constraint that levels the playing field between defenders (commodity servers) and attackers (GPU clusters). Argon2 won the 2015 Password Hashing Competition specifically for this property.

Not correct. Argon2id's key advantage is memory-hardness — it requires large amounts of RAM per hash. GPU clusters that can crack bcrypt at millions of attempts per second are constrained by memory bandwidth when attacking Argon2id. This property, combined with its PHC win in 2015, makes it the current recommendation.

11. GitGuardian's 2022 report found secrets in what proportion of public GitHub repositories?

Correct. GitGuardian's 2023 State of Secrets Sprawl report found secrets (API keys, credentials, certificates) in 1 in every 10 public GitHub repositories. The 2022 report documented 10 million individual secret exposures, with the trend accelerating alongside AI code assistant adoption — as AI generators produce working example code with literal credential values.

Not correct. GitGuardian found secrets in 1 in 10 public GitHub repositories — a striking proportion reflecting how normalized hardcoded credentials have become, accelerated by AI code generators that produce working snippets with literal placeholder credentials.

12. When auditing AI-generated Java TLS code, which HostnameVerifier implementation is a critical finding?

Correct. A HostnameVerifier returning true unconditionally accepts any hostname regardless of what the certificate says — functionally identical to verify=False in Python. An attacker intercepting the connection can present any certificate for any hostname and the verification succeeds. This is a critical MITM-enabling finding in AI-generated Java code.

Not correct. The lambda (hostname, session) -> true is the critical finding — it unconditionally accepts any hostname, making TLS hostname verification completely non-functional. Any MITM certificate for any domain is accepted. This is the Java equivalent of Python's verify=False.

13. The Lenstra et al. 2012 RSA weak-key study found approximately how many internet RSA keys shared a prime factor with at least one other key?

Correct. The Lenstra et al. study (Mining Your Ps and Qs, 2012) analyzed millions of RSA public keys from the internet and found over 27,000 — approximately 0.2% — shared a prime factor with at least one other key. Those keys were fully compromised via simple GCD computation. All affected hosts had generated RSA keys with insufficient entropy during boot.

Not correct. Lenstra et al. found over 27,000 RSA keys (~0.2%) sharing prime factors — all trivially factorable via batch GCD. The root cause was entropy starvation during embedded device boot, allowing multiple devices to independently generate RSA keys with the same prime component.

14. NIST formally deprecated 3DES (Triple-DES) in which publication and year?

Correct. NIST SP 800-131A Revision 2 (March 2019) formally deprecated 3DES for all uses, with the deprecation process beginning in 2017. 3DES was vulnerable to the SWEET32 birthday attack (Bhargavan and Leurent, 2016) which exploited the 64-bit block size to recover plaintext from long-lived TLS sessions using collision probabilities.

Not correct. NIST SP 800-131A Rev. 2 (2019) formally deprecated 3DES, with the process beginning in 2017. The SWEET32 attack (2016) had demonstrated that 3DES's 64-bit block size was vulnerable to birthday-bound collisions in long TLS sessions, making its continued use unjustifiable.

15. An AI-generated password reset token uses token = str(int(time.time() * 1000))[-6:]. This is vulnerable because:

Correct. A 6-digit token derived from millisecond timestamp has approximately 1,000 possible values per second. An attacker who knows roughly when a password reset was requested (e.g., from server-side timing or the victim's confirmation) can enumerate all plausible tokens within seconds using a simple loop. Password reset tokens must be generated from a CSPRNG with at least 128 bits of entropy (e.g., secrets.token_urlsafe(32)).

Not correct. The token space is approximately 1,000 values per second of uncertainty — trivially enumerable. An attacker who knows the approximate request time can try all plausible timestamps and crack the token in seconds. Password reset tokens must use a CSPRNG with at least 128 bits of entropy.