Synthetic Data and Self-Improvement

1. OpenAI's Dactyl project (2019) transferred robotic hand training from simulation to reality by using:

Correct. Rather than trying to simulate the real world exactly — an impossible goal — domain randomization makes the training distribution so wide that the real world becomes just one point within it.

Dactyl's key insight was domain randomization: not simulating reality perfectly, but simulating such a wide variety of physical parameters that real-world conditions become a subset of the training distribution.

2. The Vicuna-13B model was released in March 2023 using fine-tuning data from which source?

Correct. Vicuna was fine-tuned on 70,000 conversations that users had with ChatGPT and shared on ShareGPT.com — raising OpenAI ToS concerns about using API outputs for training competitive models.

Not correct. Vicuna-13B used 70,000 real ChatGPT conversations scraped from ShareGPT.com, where users had publicly shared their ChatGPT dialogue histories.

3. Microsoft's Phi-2 and Phi-3 models addressed model collapse risk primarily through:

Correct.

The Phi approach combined synthetic data with high-quality human-authored anchors — the anchor preserves the distributional breadth the loop would otherwise erode.

4. Current estimates suggest what fraction of Common Crawl web data as of 2024 crawls may be AI-generated?

Correct. Researchers estimate 5–20% AI-generated content in recent Common Crawl snapshots, though there is genuine uncertainty. This fraction is significant enough that labs doing pretraining runs on web data already face an unquantified model collapse risk from AI-generated content contamination.

Incorrect. Research estimates range from 5–20% for AI-generated content in recent web crawls. This is disputed but significant — it means future pretraining corpora already contain AI-generated text at unknown proportions, making provenance tracking a critical infrastructure problem.

5. The "capacity ceiling problem" in recursive self-improvement states that:

Correct.

The ceiling is about representable capability, not model size or convergence — the model cannot teach what it cannot yet do.

6. The Stanford Alpaca paper demonstrated that a capable instruction-following model could be produced for approximately what total cost?

Correct. The Alpaca paper reported approximately $500 in API costs and ~$100 in compute for fine-tuning — under $600 total.

Alpaca's total cost was under $600 — around $500 in API costs and $100 in GPU compute for fine-tuning LLaMA-7B.

7. In Shumailov et al.'s model collapse framework, which outputs are lost first?

Correct.

Statistical tails erode first because they are undersampled in each synthetic generation — high-probability core outputs persist longest.

8. What does the "alignment tax" in behavioral distillation refer to?

Correct. Behavioral cloning transfers surface behavior but bypasses the underlying RLHF alignment process. Models like Alpaca and Vicuna could be prompted into harmful outputs far more easily than their GPT-3.5/GPT-4 teachers.

Incorrect. The alignment tax in distillation describes how students inherit capabilities without the safety constraints — because behavioral distillation never exposes the student to the teacher's alignment training process, only its outputs.

9. In Constitutional AI's supervised learning phase, what makes the self-critique loop non-circular?

Correct.

The external anchor is the human-authored text of the constitutional principles. The model applies them at scale, but humans defined them.

10. The Constitutional AI approach (Bai et al., Anthropic 2022) achieved which key result compared to RLHF-only baselines?

Correct. Constitutional AI produced models that were both less harmful on red-teaming evaluations and roughly as helpful — while requiring far less human preference annotation. The written constitution produced reliable alignment signal at a fraction of the annotation cost.

Incorrect. Constitutional AI demonstrated that synthetic AI feedback aligned to written principles could produce better safety outcomes than RLHF with comparable helpfulness — while dramatically reducing human annotation requirements.

11. DeepMind's AlphaCode 2 pipeline validates synthetic code using test suites before including it in training data. This is an example of:

Correct. Test suites are external oracles: deterministic verifiers that distinguish correct from incorrect code without requiring human review or a learned classifier.

Incorrect. A test suite acts as an oracle — an external ground-truth mechanism that can verify correctness independently of human judgment or a learned classifier.

12. DistilBERT used which three loss signals simultaneously during training?

Correct. DistilBERT combined soft-target cross-entropy (teacher distribution), hard-target cross-entropy (ground truth), and cosine embedding loss aligning teacher and student hidden states.

Not correct. DistilBERT's three combined signals were: soft cross-entropy (against teacher logits), hard cross-entropy (against labels), and cosine embedding loss (on hidden states).

13. Self-distillation techniques like Medusa and Speculative Decoding with Draft Heads work by:

Correct. Medusa and related approaches add lightweight multi-token prediction heads to an existing frozen large model, training them using the model's own internal representations — a form of self-distillation that improves inference speed.

Incorrect. These self-distillation techniques add lightweight heads on top of a frozen large model, trained using the model's own hidden states. No separate small model is trained from scratch.

14. Which of the following best describes what instruction tuning actually accomplishes in a base language model?

Correct. Instruction tuning is a delivery mechanism — it unlocks access to latent capability without injecting new knowledge. This is why it fails when the base model genuinely lacks capability in a domain.

Instruction tuning teaches the model the format of being helpful, unlocking capabilities already embedded during pretraining. It doesn't add knowledge or fix hallucination at the source.

15. Epoch AI's 2024 paper projected that high-quality internet training data would be exhausted by:

Correct. Epoch AI's "Will We Run Out of Data?" paper projected the exhaustion window as 2026–2032.

Epoch AI projected the 2026–2032 window based on compute scaling rates versus data production rates.

16. In the SL-CAI phase, what type of prompts are used to elicit the initial harmful drafts?

Correct. Red-teaming prompts — adversarial inputs — elicit the harmful first drafts that the critique-revision loop then corrects.

SL-CAI uses red-teaming adversarial prompts to elicit harmful first drafts, which are then critiqued and revised.

17. Why does reasoning distillation require rejection sampling of teacher outputs?

Correct. Without filtering, a student trained on incorrect teacher reasoning chains learns to produce confident-sounding but wrong reasoning — a dangerous failure mode. Rejection sampling keeps only traces that arrive at correct answers.

Not correct. Rejection sampling filters teacher-generated reasoning traces by whether they reach correct final answers. This prevents the student from internalizing flawed reasoning procedures that look plausible but are wrong.

18. Constitutional AI is most reliable at reducing which category of harms?

Correct.

Subtle harms require more nuanced judgment than the self-critique loop reliably provides — the model cannot transcend its existing capability to detect those subtleties.

19. The bootstrapping problem in classifier-based quality control refers to:

Correct. The standard resolution is to use a small, carefully curated human-labeled seed set iteratively expanded through self-distillation — while remaining vigilant about bias propagation.

Incorrect. The bootstrapping problem is circular: quality classifiers need quality labels, but reducing the need for quality labels is the reason for building the synthetic pipeline.

20. What does the calibration problem mean for Constitutional AI's theoretical limits?

Correct. The calibration problem is fundamental: the critique loop operates within the critiquing model's existing worldview. Systematic blind spots — misconceptions the model doesn't know it has — are never surfaced.

The calibration problem: a model critiquing itself can only surface errors it already knows to look for. Systematic blind spots — cultural biases, domain misconceptions — are never caught because the critique is generated within the same flawed worldview.

Final Exam