← Back to Academy
Module 1 · Building with AI — Advanced | AESOP AI Academy Module 4
Color
Advanced
Module Test
Lesson 1

Define the Problem First

Problem ontology, value specification, and the architecture of AI project failure.

Amazon's hiring AI learned from ten years of successful hires. The problem: ten years of successful hires were predominantly male. The AI didn't just replicate this bias — it actively penalized resumes that included the word "women's" (as in "women's chess club") and downgraded graduates of all-women's colleges. Amazon's stated goal was to identify candidates who would succeed. Their operationalized goal was to find candidates similar to past successful hires. Those two goals were not the same — and the difference was the problem. Amazon disbanded the team in 2018. The AI had been working perfectly on the specified problem while failing on the actual one.

The Problem Specification Gap

AI project failures typically trace to one of three specification failures:

  • Proxy-goal divergence: The operationalized metric diverges from the actual goal under distributional conditions not anticipated in design (Amazon: "similar to past hires" ≠ "will succeed")
  • Stakeholder exclusion: The problem is defined by those with power over the system, not those affected by it — producing solutions that serve designers' interests over subjects'
  • Temporal myopia: Short-term success metrics that diverge from long-term goals (engagement optimization that produces long-term user harm)
Value Decomposition

Rigorous problem specification requires explicit value decomposition:

  • Terminal values: What you ultimately want (predictive hiring validity regardless of demographic)
  • Instrumental values: What you think produces terminal values (similarity to past successful hires)
  • The decomposition test: Can you articulate why your instrumental metric produces your terminal value — and under what conditions it might fail to?
The Specification Discipline

Before building any AI system, explicitly state: what are the terminal values? What instrumental metrics are you using? Why do those metrics produce those values? Under what conditions would they diverge? If you can't answer these questions, you can't safely build the system.

Quiz 1

Define the Problem First

5 questions — free, untracked, retake anytime.

was the specification gap in Amazon's hiring AI?

✓ Correct — ✅ ✓ Proxy-goal divergence: past successful hires ≠ all high-potential candidates. The proxy metric worked in training distribution; it failed when applied to candidates from underrepresented groups.
❌ ❌ Specification gap: 'similar to past successful hires' was the proxy; 'will succeed' was the terminal goal. Past hires weren't representative of potential high performers — the proxy diverged.

is the 'decomposition test' in value specification?

✓ Correct — ✅ ✓ Decomposition test: why does your instrumental metric produce your terminal value? Under what conditions would they diverge? If you can't answer this, you can't safely build.
❌ ❌ Decomposition test: articulate why your instrumental metric produces your terminal value, and under what conditions they would diverge. This exposes specification gaps before deployment.

is 'temporal myopia' as a problem specification failure?

✓ Correct — ✅ ✓ Temporal myopia: short-term metrics that diverge from long-term goals. Optimizing engagement short-term can produce long-term harms that don't appear in the success metrics.
❌ ❌ Temporal myopia: short-term success metrics that diverge from long-term goals. Engagement optimization that produces long-term user harm or radicalization is the canonical example.

is the difference between 'terminal values' and 'instrumental values' in AI specification?

✓ Correct — ✅ ✓ Terminal: what you ultimately want. Instrumental: what you think produces terminal values. AI optimizes the instrumental metric — specification risk is when instrumental and terminal values diverge.
❌ ❌ Terminal: what you ultimately want. Instrumental: what you optimize as a proxy. AI optimizes instrumental metrics. The risk is when instrumental metrics diverge from terminal values under unanticipated conditions.

does stakeholder exclusion from problem definition produce specification failures?

✓ Correct — ✅ ✓ Stakeholder exclusion: problem definers optimize for their own interests and information. Excluded stakeholders bring information about failure modes and values that insiders lack — their absence produces blind spots.
❌ ❌ Stakeholder exclusion: those who define the problem optimize for their interests and information. Affected parties who are excluded bring knowledge of failure modes that insiders lack.
Lab 1

Value Decomposition Analysis

Apply rigorous value decomposition to AI problem specification.

Lab 1 — Value Decomposition Analysis

Practice rigorous value decomposition for AI specification.

  1. The AI opens with the Amazon case and asks: what would a correctly specified hiring AI have required — what terminal value, what instrumental metric, and what test would have revealed the proxy-goal divergence before deployment?
  2. Apply the decomposition framework to a different high-stakes AI use case: state the terminal value, instrumental metric, and the conditions under which they would diverge.
  3. Address: how do you build stakeholder inclusion into problem specification without making the process unworkably slow?
Consider: terminal vs. instrumental values, proxy-goal divergence conditions, temporal myopia risks, and stakeholder inclusion mechanisms.
🎯 AI GuideLab 1
Lesson 2

Prompting as Design

Prompt architecture, context engineering, and the epistemics of system design.

Research published in 2023 found that models prompted with "You are an expert X" produced outputs rated as higher quality by domain experts than the same models without persona specification — even though the model weights didn't change. Further research found that prompts containing logical inconsistencies, contradictions, or ambiguous instructions produced highly variable outputs that could be exploited adversarially. The system prompt is not just instructions — it is the epistemic context that shapes what patterns the model draws on. Understanding this mechanistically, not just empirically, is required for reliable system design.

Prompt Architecture Principles
  • Epistemic framing: The system prompt activates patterns in the model's learned representations. Specific, coherent framing activates relevant patterns consistently; vague framing activates averages.
  • Contradiction detection: Internal contradictions in system prompts create exploitable ambiguity — adversarial users can invoke the contradicting instruction to elicit unintended behavior.
  • Scope specification: Explicit scope (what the AI should and should not address) is as important as behavioral requirements — out-of-scope handling must be explicitly designed.
  • Uncertainty architecture: How should the AI behave when it doesn't know? This must be explicitly specified — the default (confident generation) is rarely the right behavior for high-stakes applications.
Context Engineering at Scale

Large-scale AI deployments require systematic context engineering beyond individual prompt design:

  • RAG (Retrieval-Augmented Generation): Inject verified knowledge at inference time to reduce hallucination and knowledge cutoff problems
  • Dynamic context injection: Provide relevant user/session context that the model wouldn't otherwise have
  • Guardrail layers: Pre- and post-processing filters that enforce constraints the system prompt alone can't reliably produce
The Architecture Insight

Reliable AI systems at scale typically combine: a carefully designed system prompt, a RAG layer for factual grounding, dynamic context injection for relevance, and guardrail layers for constraint enforcement. No single element is sufficient.

Quiz 2

Prompting as Design

5 questions — free, untracked, retake anytime.

do logical contradictions in system prompts create security vulnerabilities?

✓ Correct — ✅ ✓ Contradictions create exploitable ambiguity: when two instructions conflict, adversarial users can frame requests to invoke the conflicting instruction and elicit unintended behavior.
❌ ❌ Contradictions create exploitable ambiguity: adversarial users can invoke contradicting instructions to elicit behavior that violates the system's intended design.

does RAG (Retrieval-Augmented Generation) address in AI system design?

✓ Correct — ✅ ✓ RAG: inject verified knowledge at inference time. Addresses hallucination (model generates facts from patterns) and knowledge cutoff (training data doesn't include recent information).
❌ ❌ RAG addresses hallucination and knowledge cutoff: inject verified, current knowledge at inference time rather than relying on what the model learned during training.

is 'uncertainty architecture' as a design requirement for AI systems?

✓ Correct — ✅ ✓ Uncertainty architecture: explicitly design what the AI does when uncertain. The model default is confident generation — this must be overridden with explicit uncertainty handling in high-stakes contexts.
❌ ❌ Uncertainty architecture: explicitly specify how the AI behaves under uncertainty. The model default is confident generation — which is rarely right for high-stakes applications and must be explicitly overridden.

is no single element (system prompt, RAG, guardrails) sufficient for reliable AI systems at scale?

✓ Correct — ✅ ✓ Layered architecture: system prompts handle behavioral framing; RAG handles factual grounding; dynamic context handles relevance; guardrail layers enforce constraints. Each addresses failure modes the others can't.
❌ ❌ Each layer addresses different failure modes: system prompts (behavioral framing), RAG (factual grounding), dynamic context (relevance), guardrails (constraint enforcement). No single layer covers all failure modes.

mechanistic insight explains why specific prompts produce better outputs than vague ones?

✓ Correct — ✅ ✓ Mechanistic insight: specific prompts activate relevant learned patterns consistently. Vague prompts activate pattern averages. This explains why persona specification and detailed behavioral requirements produce better outputs.
❌ ❌ Mechanistically: specific prompts activate relevant learned patterns in the model's representations consistently. Vague prompts activate statistical averages across many patterns, producing less targeted, less reliable outputs.
Lab 2

System Architecture Design

Design a full AI system architecture for a complex use case.

Lab 2 — System Architecture Design

Design a full AI system architecture for a complex use case.

  1. The AI presents a complex use case requiring reliable AI behavior at scale. Your task: design the full system architecture — system prompt, RAG layer (what knowledge to retrieve?), dynamic context (what to inject?), and guardrail layers (what to enforce externally?).
  2. Identify the contradiction and ambiguity risks in your system prompt design and resolve them.
  3. Address: how would you test that your layered architecture produces reliable behavior across adversarial inputs?
Think in layers: prompt, knowledge, context, guardrails. What failure mode does each layer address? What failure mode does each layer not address?
🎯 AI GuideLab 2
Lesson 3

Evaluating AI Output

Evaluation methodology, benchmark design, and the epistemics of AI quality assessment.

In 2023, researchers discovered that AI coding assistants — GitHub Copilot and others — generated security vulnerabilities at measurable rates: CWE (Common Weakness Enumeration) issues in 40% or more of security-sensitive code completions in some studies. Developers using these tools didn't notice: the code looked correct, compiled, and passed basic tests. The vulnerability was in logical correctness under adversarial conditions — something standard evaluation didn't test. The evaluation problem: most developers (and most evaluation frameworks) test for 'does this work?' not 'can this be exploited?'

Evaluation Framework Design

Rigorous AI evaluation requires multiple distinct dimensions measured independently:

  • In-distribution accuracy: Performance on inputs like those in training/evaluation data
  • Out-of-distribution robustness: Performance on novel inputs outside the expected distribution
  • Adversarial robustness: Performance on inputs specifically designed to trigger failures
  • Subgroup performance: Disaggregated performance across demographic and contextual subgroups
  • Failure mode characterization: Not just overall accuracy, but what types of errors occur and how frequently
The Benchmark Design Problem

Evaluations measure what they measure — which may not be what you care about:

  • Benchmark saturation: AI systems can overfit to benchmarks, performing well on the test without generalizing to real tasks
  • Construct validity: Does the benchmark actually measure the capability you care about?
  • Adversarial evaluation: Benchmarks designed to be 'answered correctly' miss failure modes that adversarial testing finds
The Evaluation Discipline

An AI system that passes your evaluation framework is safe only within the scope of what your evaluation measured. Knowing what your evaluation doesn't measure is as important as knowing what it does.

Quiz 3

Evaluating AI Output

5 questions — free, untracked, retake anytime.

did the GitHub Copilot security vulnerability study reveal about evaluation frameworks?

✓ Correct — ✅ ✓ Evaluation gap: 'does this work?' doesn't test 'can this be exploited?'. Standard evaluation missed security vulnerabilities that adversarial testing would have found.
❌ ❌ The study revealed that standard evaluation (does it compile, does it work?) doesn't test adversarial correctness — code can pass basic evaluation while containing security vulnerabilities.

is 'construct validity' as an evaluation design concern?

✓ Correct — ✅ ✓ Construct validity: does your benchmark measure the actual capability, or a proxy? AI that excels on benchmarks may fail on real tasks if the benchmark's construct validity is weak.
❌ ❌ Construct validity: does the benchmark actually measure the capability you care about? AI can overfit to benchmarks that don't generalize to real-world task performance.

is 'failure mode characterization' and why is it more informative than overall accuracy?

✓ Correct — ✅ ✓ Failure mode characterization: what kinds of errors, how often, and for whom. 95% accuracy that fails on a specific demographic or scenario type may be unacceptable — aggregate accuracy masks this.
❌ ❌ Failure mode characterization: what types of errors occur and how often. High aggregate accuracy can mask systematic failures on specific subgroups or scenario types that are critical to understand.

is 'benchmark saturation' as an evaluation problem?

✓ Correct — ✅ ✓ Benchmark saturation: optimization for benchmark performance without underlying capability generalization. AI performs well on the test and fails in deployment on the real task.
❌ ❌ Benchmark saturation: AI overfits to the test distribution, performing well on the benchmark without generalizing to real-world deployment — appearing capable on evaluation while failing in production.

must evaluators know what their evaluation framework doesn't measure?

✓ Correct — ✅ ✓ Evaluation scope limits safety certification: AI that passes your framework is safe within the scope of what you measured. Unknown measurement gaps become unknown unknowns — post-deployment failures.
❌ ❌ Evaluation scope limits safety certification. AI that passes your framework is safe only within what you measured. Unknown measurement gaps become unknown unknowns — deployment failures you didn't anticipate.
Lab 3

Evaluation Framework Design

Design a rigorous multi-dimensional evaluation framework.

Lab 3 — Evaluation Framework Design

Design a rigorous evaluation framework with explicit scope limitations.

  1. The AI presents a high-stakes use case. Design a multi-dimensional evaluation framework covering in-distribution accuracy, out-of-distribution robustness, adversarial robustness, subgroup performance, and failure mode characterization.
  2. Explicitly state what your evaluation framework doesn't measure — the known gaps.
  3. Address: given those gaps, what conditional deployment approach would you recommend?
An evaluation framework that doesn't state its own limitations is dangerous. What you don't measure will find you in deployment.
🎯 AI GuideLab 3
Lesson 4

Human-AI Workflow Design

Organizational design, automation bias mitigation, and the human factors of AI systems.

The 2009 crash of Air France 447 occurred partly because of automation dependency: when the autopilot disconnected due to sensor malfunction, the pilots — who had been managing automated systems rather than manually flying — were unable to diagnose and respond to the situation. They had the controls; they lacked the situational awareness. The NTSB and BEA reports highlighted a structural problem: designing systems for normal operations creates skill atrophy that makes failure situations worse. This is the automation paradox: the more reliable the automation, the less practiced humans are at handling the cases when it fails.

The Automation Paradox

The automation paradox has direct implications for AI workflow design:

  • As AI handles routine decisions reliably, humans lose practice with those decisions
  • The cases AI escalates to humans are precisely the unusual, difficult ones that humans are least practiced at
  • Human skill atrophy means oversight quality degrades over time, even if the AI improves
  • Training programs must actively counteract skill atrophy in high-stakes AI-assisted domains
Organizational Design for Meaningful Oversight

Meaningful human oversight requires organizational design, not just technical design:

  • Role preservation: Maintain human practice in the decision areas AI handles — through simulation, regular manual operation, or deliberate override exercises
  • Override culture: Organizational culture must actively support human override of AI recommendations without stigma
  • Escalation clarity: Clear, pre-defined escalation paths for edge cases and AI uncertainty
  • Incentive alignment: Humans who are incentivized by throughput (processing more AI recommendations faster) may not override even when they should
The Design Principle

Human oversight isn't just a workflow feature — it requires organizational culture, training, incentive structures, and role design that make meaningful oversight possible in practice, not just in theory.

Quiz 4

Human-AI Workflow Design

5 questions — free, untracked, retake anytime.

is the 'automation paradox' and why does it matter for AI workflow design?

✓ Correct — ✅ ✓ Automation paradox: reliable automation atrophies human skill. The cases escalated to humans are the hardest ones — and humans are least practiced at them because AI handled the routine cases.
❌ ❌ Automation paradox: reliable automation atrophies human skill in the automated domain. The cases AI escalates are the hardest — and humans are least practiced at exactly those cases.

does incentive structure affect oversight quality in human-AI workflows?

✓ Correct — ✅ ✓ Incentive misalignment: humans rewarded for throughput face pressure to accept AI recommendations quickly. Effective oversight requires incentive structures that reward quality review, not speed.
❌ ❌ Incentive misalignment: humans incentivized by throughput face organizational pressure to accept AI quickly. Effective oversight requires incentives that reward careful evaluation, not processing speed.

is 'role preservation' as an organizational design requirement for AI-augmented workplaces?

✓ Correct — ✅ ✓ Role preservation: actively maintaining human skill in AI-handled domains through simulation, deliberate practice, or regular manual operation. Prevents the skill atrophy that makes AI failures unmanageable.
❌ ❌ Role preservation: actively maintaining human skill in AI-handled decision domains through practice. Without deliberate role preservation, automation atrophies the skills needed when AI fails.

must override culture be explicitly designed rather than assumed?

✓ Correct — ✅ ✓ Override culture requires design: organizations naturally penalize deviation from AI (slower, more effort, questioning a reliable system). Without deliberate culture design, humans default to accepting AI recommendations.
❌ ❌ Override culture requires deliberate design: organizations naturally penalize deviation from automated recommendations (it's slower and effortful). Without explicit culture design, humans default to accepting AI.

does Air France 447 illustrate about designing AI systems for failure cases?

✓ Correct — ✅ ✓ AF447: reliable normal-operation design created skill atrophy. When the automated system failed, humans had the controls but lacked the practiced situational awareness to respond. Normal-operation design created failure-case vulnerability.
❌ ❌ AF447: designing for reliable normal operation created skill atrophy that degraded failure-case performance. Normal-operation design and failure-case human performance are in tension — both must be designed for explicitly.
Lab 4

Organizational Design for AI Oversight

Design organizational infrastructure for meaningful AI oversight.

Lab 4 — Organizational Design for AI Oversight

Design the organizational infrastructure for meaningful AI oversight.

  1. The AI opens with the automation paradox applied to a high-stakes AI deployment. Design the organizational infrastructure: role preservation program, override culture requirements, incentive structures, and escalation paths.
  2. Identify the automation paradox risks specific to your chosen use case and how your organizational design addresses them.
  3. Address: how would you measure whether human oversight is meaningful in practice — not just nominal on paper?
Technical workflow design is necessary but not sufficient. Organizational culture, incentives, and training determine whether oversight is real.
🎯 AI GuideLab 4
Lesson 5

Testing and Red-Teaming Your AI System

Structured adversarial evaluation, failure mode taxonomy, and pre-deployment safety engineering.

In 2022, Anthropic published research on "discovering language model behaviors" through structured elicitation — systematic attempts to find behaviors models could exhibit that weren't intended or anticipated. They found that models could produce outputs that were harmful in specific contexts even when safety training had reduced harmful outputs in general contexts. The key insight: a model's general safety performance on standard evaluations does not predict its behavior under targeted adversarial elicitation. Standard evaluation and adversarial evaluation measure different things — and both are necessary.

Adversarial Evaluation Architecture

Rigorous pre-deployment safety evaluation requires a structured adversarial program:

  • Threat model: Who are the adversarial users? What are their goals? What attack surfaces does the system expose?
  • Attack taxonomy: Systematic coverage of attack types — direct elicitation, indirect elicitation, multi-turn manipulation, context injection, role-play framing
  • Internal red team + external red team: Internal teams know the system; external teams bring unbiased adversarial creativity
  • Automated adversarial generation: AI-assisted generation of adversarial inputs at scale — finding failure modes that human red teams might miss
Failure Mode Taxonomy

Effective red-teaming requires a structured failure mode taxonomy:

  • Safety failures: Harmful content, dangerous advice, privacy violations, illegal facilitation
  • Honesty failures: Hallucination, false confidence, misleading framing, deceptive implicature
  • Fairness failures: Demographic performance disparities, discriminatory outputs, biased recommendations
  • Robustness failures: Adversarial brittleness, distribution shift failure, specification gaming
The Coverage Principle

Red-team coverage should be proportional to harm severity times probability. Safety failures in high-stakes applications deserve exhaustive testing; minor usability issues deserve proportionally less attention.

Quiz 5

Testing and Red-Teaming

5 questions — free, untracked, retake anytime.

did Anthropic's behavior elicitation research demonstrate about standard vs. adversarial evaluation?

✓ Correct — ✅ ✓ Standard vs. adversarial evaluation measure different things. Good standard safety performance doesn't predict behavior under targeted adversarial elicitation — both are necessary.
❌ ❌ Standard safety evaluation and adversarial elicitation measure different things. Good standard performance doesn't predict behavior under targeted adversarial elicitation — both are necessary for comprehensive pre-deployment testing.

does rigorous red-teaming require both internal and external teams?

✓ Correct — ✅ ✓ Complementary roles: internal teams know the system and can design informed attacks; external teams bring adversarial creativity unconstrained by system familiarity. Both find failure modes the other misses.
❌ ❌ Internal teams: system knowledge enables specific, informed attacks. External teams: unbiased adversarial creativity unconstrained by system familiarity. Both find failure modes the other misses.

is a 'threat model' in the context of AI red-teaming?

✓ Correct — ✅ ✓ Threat model: who are adversarial users, what are their goals, what attack surfaces exist. This grounds adversarial testing in realistic attack scenarios rather than ad hoc attempts.
❌ ❌ Threat model: structured analysis of adversarial users, their goals, and attack surfaces. This grounds red-teaming in realistic scenarios rather than ad hoc attempts.

is 'automated adversarial generation' and what does it address?

✓ Correct — ✅ ✓ Automated adversarial generation: AI-assisted generation of adversarial inputs at scale. Covers a larger input space than human red teams can manually explore — finding failure modes humans might miss.
❌ ❌ Automated adversarial generation: using AI to generate adversarial test inputs at scale, covering failure modes across a much larger space than human red teams can manually explore.

should red-team coverage be prioritized according to the coverage principle?

✓ Correct — ✅ ✓ Coverage proportional to harm severity × probability: highest-stakes, most likely failures deserve exhaustive testing. This allocates red-team effort efficiently rather than uniformly.
❌ ❌ Coverage principle: proportional to harm severity times probability. High-severity, plausible failures deserve exhaustive testing; low-severity, unlikely ones deserve proportionally less.
Lab 5

Red-Team Program Design

Design a comprehensive adversarial evaluation program.

Lab 5 — Red-Team Program Design

Design a comprehensive adversarial evaluation program.

  1. The AI presents a complex high-stakes AI deployment. Design a full red-team program: threat model, attack taxonomy, internal/external team design, automated adversarial generation approach, and do-not-deploy criteria.
  2. Apply the coverage principle: which failure modes get exhaustive testing, and why?
  3. Address: how do you report red-team findings to stakeholders who may resist negative findings that delay launch?
Think adversarially from multiple angles: direct attacks, indirect manipulation, multi-turn exploitation, demographic subgroup failures. What would a skilled attacker do first?
🎯 AI GuideLab 5
Lesson 6

Responsible Deployment

Deployment ethics, ongoing monitoring, and the governance of production AI systems.

When Twitter's image-cropping algorithm was found in 2020 to systematically crop images to show white faces over Black faces, the company's initial response was to dispute the finding. Further research by their own Responsible ML team confirmed it. The algorithm had been deployed and operating for years — cropping millions of images — before the bias was identified. The team that found it internally noted that the monitoring infrastructure to detect bias post-deployment hadn't existed; the finding came from an external journalist who noticed the pattern. The monitoring failure preceded the bias failure — no one was watching for it.

Production AI Monitoring Architecture
  • Performance monitoring: Track accuracy, latency, error rates on production traffic — with disaggregated metrics by subgroup
  • Distribution shift detection: Monitor for changes in input distributions that signal the model is operating outside its training distribution
  • Harm monitoring: Proactive monitoring for harmful output patterns — not just reactive response to reports
  • Feedback loop management: Monitor for cases where AI outputs influence future training data in ways that amplify biases
Governance of Production AI
  • Model cards and transparency reports: Publish what the system does, how it performs, and what limitations it has
  • Incident response: Pre-defined escalation paths and response protocols for identified harms
  • Deprecation policy: Explicit criteria for retiring or substantially modifying deployed systems
  • External audit access: Meaningful external audit rights for high-stakes AI systems
The Twitter Lesson

The Twitter image cropping case illustrates two failures: a biased algorithm and absent monitoring. The second failure made the first failure invisible for years. Deployers of AI systems are responsible for building the monitoring infrastructure to detect problems — not waiting for journalists to find them.

Quiz 6

Responsible Deployment

5 questions — free, untracked, retake anytime.

two failures did the Twitter image cropping case illustrate?

✓ Correct — ✅ ✓ Two failures: biased algorithm + absent monitoring. The monitoring failure made the bias failure invisible for years. Both must be addressed — monitoring is as critical as the algorithm.
❌ ❌ Two failures: biased algorithm and absent monitoring infrastructure. The monitoring failure made the algorithm failure invisible — both are the deployer's responsibility.

is 'distribution shift detection' in production AI monitoring?

✓ Correct — ✅ ✓ Distribution shift detection: monitor input distributions for changes that put the model outside its training distribution — an early warning of performance degradation before accuracy metrics reflect it.
❌ ❌ Distribution shift detection: monitoring for changes in input distributions that indicate the model is operating outside its training distribution — an early warning before performance metrics degrade.

is a 'feedback loop' monitoring concern in production AI?

✓ Correct — ✅ ✓ Feedback loop: AI outputs influence real-world outcomes and future training data, potentially amplifying biases over time. Monitoring must detect when the production system is corrupting its own training signal.
❌ ❌ Feedback loop: AI outputs influence future training data — potentially amplifying biases over time. Monitoring must detect when production outputs are creating training signal corruption.

does 'external audit access' require for high-stakes AI systems?

✓ Correct — ✅ ✓ External audit access: qualified auditors with meaningful access to evaluate performance, design, and outputs — not just accepting deployer self-reporting. High-stakes systems require independent accountability.
❌ ❌ External audit access: meaningful rights for qualified auditors to evaluate performance, design, and outputs independently — not relying on deployer self-reporting for high-stakes AI systems.

does the Twitter case illustrate about deployers' monitoring responsibilities?

✓ Correct — ✅ ✓ Deployer monitoring responsibility: build proactive monitoring infrastructure to detect problems. Waiting for journalists, users, or researchers to identify harms is not a responsible deployment posture.
❌ ❌ The Twitter case: deployers are responsible for proactive monitoring infrastructure. Waiting for external parties to identify harms — rather than building systems to detect them — is not responsible deployment.
Lab 6

Production Monitoring Design

Design a comprehensive production AI monitoring and governance framework.

Lab 6 — Production Monitoring Design

Design a comprehensive production AI monitoring and governance framework.

  1. The AI opens with the Twitter case and asks: design a production monitoring architecture that would have detected the image cropping bias — covering performance monitoring, distribution shift detection, harm monitoring, and feedback loop management.
  2. Design the governance structure: model cards, incident response, deprecation criteria, and external audit access.
  3. Address: what organizational incentives work against deployers building adequate monitoring, and how do you overcome them?
Monitoring is as important as the algorithm. If you can't detect failures, you can't fix them.
🎯 AI GuideLab 6
Lesson 7

Building for Equity and Access

Equity by design, participatory development, and the politics of AI access.

The MIT Media Lab's Civic Media project documented a pattern in AI civic tech deployment: well-funded organizations built AI tools for underserved communities without involving those communities in design. The tools reflected the designers' assumptions about what those communities needed — which differed significantly from what communities said they needed when asked. Tools designed for efficiency often conflicted with community values around relationship and trust. Tools designed for data collection conflicted with communities' justified distrust of institutions that had used data against them. The pattern had a name: "parachute tech" — solutions dropped in from outside without community anchoring.

Equity by Design vs. Equity by Afterthought

Equity considerations integrated throughout design produce different results than equity reviews at the end:

  • Early stage: Is this the right problem to solve? Are there better-positioned builders? Is this community-initiated or parachuted?
  • Problem definition: Who defines the problem? Who is included in defining success? Whose values shape the optimization target?
  • Data collection: Is data representative? Do communities have sovereignty over their data? Is collection consensual?
  • Testing: Are subgroup performance differences systematically tested? Are affected communities involved in evaluation?
  • Deployment: Is access equitable? Is the tool maintained to remain equitable over time?
Participatory Design in Practice

Participatory design — involving affected communities in design — is not just ethically required; it produces better tools:

  • Community members surface failure modes designers don't anticipate
  • Community trust is required for adoption — parachute tech fails to spread
  • Community values must be incorporated to design appropriate optimization targets
The Access Principle

Building AI tools for underserved communities without them produces tools that serve designers' assumptions rather than communities' needs. Participation is not just ethical obligation — it is the technical requirement for building tools that actually work.

Quiz 7

Building for Equity and Access

5 questions — free, untracked, retake anytime.

is 'parachute tech' and why does it fail?

✓ Correct — ✅ ✓ Parachute tech: solutions built for communities without them. Reflects designer assumptions not community needs; fails to earn community trust; fails to spread. Participation is both ethical and technically necessary.
❌ ❌ Parachute tech: technology built for communities without their involvement. Reflects designers' assumptions rather than community needs; fails to earn trust; fails to spread.

is participatory design a technical requirement, not just an ethical obligation?

✓ Correct — ✅ ✓ Participatory design is technically necessary: communities surface unanticipated failure modes, community trust is required for adoption, and community values must inform optimization targets — without these, the tool fails.
❌ ❌ Participatory design is technically necessary: communities surface failure modes designers don't anticipate, trust enables adoption, and community values are required to design appropriate optimization targets.

is 'data sovereignty' in the context of equity-centered AI design?

✓ Correct — ✅ ✓ Data sovereignty: communities' meaningful control over data collected from them — a particularly important concern for communities with justified distrust of institutions that have historically used data against them.
❌ ❌ Data sovereignty: communities' meaningful control over data generated by them — how it's used, who accesses it, whether it could be used against them. Especially important for communities with justified institutional distrust.

does 'equity by design' require compared to 'equity by afterthought'?

✓ Correct — ✅ ✓ Equity by design: integrated throughout every stage (problem definition, data, testing, deployment). Equity by afterthought: review at the end that can't undo fundamental design decisions made without equity consideration.
❌ ❌ Equity by design: integrated throughout every stage. Equity by afterthought: a final review that can't undo fundamental design decisions — problem definition, data collection choices — already made without equity consideration.

is the question 'are there better-positioned builders?' important in equity-centered AI development?

✓ Correct — ✅ ✓ Better-positioned builders: communities may be better served by community-based builders, tools they build themselves, or builders with existing relationships — outside builders aren't always the right answer regardless of resources.
❌ ❌ Better-positioned builders: communities may be better served by community-based builders or tools they build themselves. Outside builders with resources aren't always the right answer — community anchoring often matters more.
Lab 7

Participatory Design Framework

Design an equity-centered AI development process.

Lab 7 — Participatory Design Framework

Design an equity-centered AI development process.

  1. The AI opens with the parachute tech problem and asks: you have been asked to build an AI tool for a community you're not a member of. Design a participatory development process that would avoid parachute tech failure modes — from problem definition through deployment.
  2. Identify the specific stages where community participation changes design decisions and what it changes them to.
  3. Address: participatory design takes more time and is organizationally harder. How do you make the case for it when there's commercial pressure to move faster?
Participation is the technical requirement. Design processes that make it real, not nominal.
🎯 AI GuideLab 7
Lesson 8

The Builder's Ongoing Responsibilities

Professional ethics, whistleblowing, and the long-term obligations of those who build consequential AI.

In 2024, several current and former employees of major AI companies signed an open letter — "A Right to Warn About Advanced AI" — arguing that AI companies' confidentiality agreements prevented them from raising safety concerns publicly, even when internal escalation had failed. The letter called for the right to report safety concerns to regulators and the public without retaliation. This is a new frontier in professional ethics: what obligations do people who build consequential AI systems have when the organizations they work for resist safety concerns? The question isn't hypothetical — it's current.

Post-Deployment Obligations

The builder's responsibility doesn't end at deployment — it evolves:

  • Ongoing monitoring participation: The people who built a system have knowledge of its design that makes them uniquely positioned to interpret monitoring data
  • Incident response: When failures occur, builders have both capacity and responsibility to contribute to understanding and fixing them
  • Proactive disclosure: Known limitations and failure modes should be disclosed — not suppressed to protect commercial interests
  • Deprecation advocacy: When a system should be retired or substantially changed, builders are uniquely positioned to advocate for that
Internal Escalation, Whistleblowing, and Exit

When organizations resist safety concerns, builders face a classic ethical sequence:

  • Internal escalation: Raise the concern through legitimate internal channels — with documentation
  • External escalation: If internal escalation fails, regulatory and legal channels may be available — and may require disclosure
  • Whistleblowing: Public disclosure as a last resort when other channels have failed and harm is serious — with significant personal and professional risk
  • Exit: Refusing to continue work on a harmful system
The Professional Obligation

Building consequential AI carries ongoing professional responsibility. The people who build these systems have technical knowledge that uniquely positions them to identify when something is wrong — and that positioning creates responsibility that doesn't end at deployment or when employment ends.

Quiz 8

The Builder's Ongoing Responsibilities

5 questions — free, untracked, retake anytime.

did the 'A Right to Warn' open letter address?

✓ Correct — ✅ ✓ The letter addressed a genuine ethical conflict: confidentiality agreements that may prevent employees from reporting safety concerns externally — even when internal escalation has failed and harms are serious.
❌ ❌ The letter: confidentiality agreements prevent employees from reporting safety concerns to regulators or the public even when internal escalation fails. It argued for protection for those who escalate externally.

do builders have unique post-deployment monitoring obligations?

✓ Correct — ✅ ✓ Builders' design knowledge uniquely positions them to interpret monitoring data — they know what the system was designed to do and can identify when production behavior diverges from design intent.
❌ ❌ Builders' design knowledge uniquely positions them to interpret monitoring data. They know what was intended and can identify divergence from design intent that others might miss.

distinguishes 'exit' from 'whistleblowing' as responses to organizational safety resistance?

✓ Correct — ✅ ✓ Exit: refuses further participation in the harm. Whistleblowing: publicly discloses the harm. Exit is personally safer; whistleblowing may be necessary when the harm continues regardless of individual exit.
❌ ❌ Exit: refuses further participation. Whistleblowing: publicly discloses harm. Exit is personally safer but only removes individual participation; whistleblowing may be necessary when the harm continues without public disclosure.

is 'proactive disclosure' as an ongoing builder responsibility?

✓ Correct — ✅ ✓ Proactive disclosure: disclose known limitations and failure modes rather than suppressing them commercially. Users, deployers, and affected parties can't make informed decisions without this information.
❌ ❌ Proactive disclosure: disclose known limitations and failure modes rather than suppressing them for commercial reasons. Informed decision-making by users, deployers, and affected parties requires this.

does the professional responsibility of AI builders extend beyond employment?

✓ Correct — ✅ ✓ Technical knowledge creates ongoing responsibility: builders carry design knowledge relevant to safety assessments after employment ends. Knowledge that uniquely positions you to identify harm creates responsibility that doesn't expire with employment.
❌ ❌ Technical knowledge creates ongoing responsibility: builders carry design knowledge that remains relevant to safety even after employment. The positioning to identify harm creates responsibility that doesn't expire.
Lab 8

Synthesis: The Builder's Obligations

Synthesize the curriculum and develop your builder's ethics framework.

Lab 8 — Synthesis: The Builder's Obligations

Synthesize the curriculum and develop your personal framework for building responsibly.

  1. The AI opens with the 'A Right to Warn' case and asks: you are building a consequential AI system and believe your organization is inadequately addressing a safety concern you've raised internally. Walk through your personal ethical decision process: internal escalation, external escalation, whistleblowing, exit.
  2. Drawing on the full curriculum — from problem definition through deployment to ongoing responsibilities — what are the three most important principles you take away for building AI responsibly?
  3. Address: this is the final lesson of the AESOP AI Academy curriculum. What will you do differently because of what you've learned?
This is the synthesis. The curriculum has been preparation for participation. What are you prepared to do?
🎯 AI GuideLab 8

Module 4 Test

8 questions covering all lessons. Free, untracked, retake anytime.

divergence in AI specification means:

✓ Correct — ✅ ✓ Proxy-goal divergence: the operationalized metric diverges from the terminal goal. Amazon: 'similar to past hires' ≠ 'will succeed'. The AI worked perfectly on the specified proxy while failing the actual goal.
❌ ❌ Proxy-goal divergence: the operationalized metric diverges from the terminal goal under unanticipated conditions. Amazon's AI worked perfectly on the specified problem while failing the actual one.

contradictions in system prompts create:

✓ Correct — ✅ ✓ Contradictions create exploitable ambiguity: adversarial users can invoke contradicting instructions to elicit behavior that violates intended design.
❌ ❌ Contradictions create exploitable ambiguity: adversarial users invoke the contradicting instruction to elicit unintended behavior that violates the intended design.

GitHub Copilot security vulnerability finding shows:

✓ Correct — ✅ ✓ Evaluation gap: 'does it work?' doesn't test 'can it be exploited?'. Standard evaluation missed security vulnerabilities that adversarial testing would have found.
❌ ❌ The finding: standard evaluation (does it work?) doesn't test adversarial correctness. Code that passes standard evaluation can contain security vulnerabilities.

automation paradox means that for high-stakes AI-assisted domains:

✓ Correct — ✅ ✓ Automation paradox: reliable automation atrophies human skill. Organizations must actively maintain human skill through deliberate practice — the failure cases are the ones humans are least prepared for.
❌ ❌ Automation paradox: reliable automation atrophies human skill in the automated domain. Deliberate practice must maintain the skills needed when automation fails.

behavior elicitation research demonstrated that:

✓ Correct — ✅ ✓ Standard safety evaluation and adversarial elicitation measure different things. Good standard performance doesn't predict adversarial elicitation behavior — both are necessary.
❌ ❌ Standard and adversarial evaluation measure different things. Standard safety performance doesn't predict adversarial elicitation behavior — comprehensive assessment requires both.

Twitter image cropping case illustrates:

✓ Correct — ✅ ✓ Two failures: biased algorithm + absent monitoring. Monitoring failure made algorithm failure invisible. Both are deployer responsibilities.
❌ ❌ Two failures: biased algorithm and absent monitoring infrastructure. The monitoring failure made the algorithm failure invisible for years. Both are core deployer responsibilities.

design is a technical requirement (not just an ethical obligation) because:

✓ Correct — ✅ ✓ Technically necessary: communities surface failure modes designers miss, trust enables adoption, and community values are required for appropriate optimization targets. Without participation, tools fail.
❌ ❌ Participatory design is technically necessary: communities surface failure modes designers miss, trust enables adoption, and community values are required to design optimization targets that actually work.

professional responsibility of AI builders:

✓ Correct — ✅ ✓ Ongoing responsibility: technical knowledge creates ongoing positioning to identify harm. That positioning creates responsibility that doesn't expire with employment.
❌ ❌ Professional responsibility extends beyond employment: technical knowledge about system design creates ongoing ability to identify harm — and that positioning creates responsibility that doesn't expire.