In August 2017, a video went viral without any AI company's permission. A Black man named Chukwuemeka Afα»labi stood at a motion-activated soap dispenser in a hotel bathroom. He waved his hand. Nothing. He waved again. Nothing. Then a white colleague walked up, held their hand in the same spot β and the dispenser immediately released soap.
The video spread because the dispenser was not broken. It worked exactly as designed. The problem was what it had been trained to detect: reflected infrared light, calibrated on hands with lighter skin tones. Darker skin absorbs more infrared. The sensor never "learned" that darker hands existed. So to the machine, they were invisible.
The company had not programmed racism into its sensor. But racism β or at least, a world that had mostly tested this technology on one group of people β had found its way in anyway. Through the data.
Every AI system learns from examples. You show it thousands β sometimes billions β of pictures, texts, measurements, or decisions, and it finds patterns. The collection of examples you use is called training data. It is the entire world the AI has ever seen.
Here is the problem: the real world is not a neutral, perfectly balanced place. Some groups have been photographed more. Some voices have been recorded more. Some languages appear far more often on the internet. When you scoop up a giant pile of "the world" and hand it to an AI, you're actually handing it a very specific slice of the world β one shaped by who had power, who had cameras, who had access to technology first.
The AI doesn't know this. It just sees what it sees. And it assumes that what it sees is everything.
In January 2020, a man named Robert Williams was arrested in Detroit, Michigan. Police officers came to his home, handcuffed him in front of his wife and daughters, and drove him to a detention center. He spent 30 hours in jail before investigators realized their lead had come entirely from a face recognition algorithm β and that the algorithm was wrong.
The system had matched Williams's driver's license photo to a blurry surveillance image from a watch store robbery. Williams had never been to the store. The AI had made a false identification. But investigators had trusted it enough to make an arrest.
Later research revealed why this kind of error was predictable. A 2018 study by researcher Joy Buolamwini at MIT β published under the name Gender Shades β tested face recognition systems from Microsoft, IBM, and Face++ on thousands of faces. The results were stark: the systems were up to 34% less accurate on dark-skinned women compared to lighter-skinned men. The training datasets had simply contained far more lighter-skinned faces. The machines had learned a world that didn't reflect everyone equally, and they performed exactly as well as that skewed world had equipped them to.
Robert Williams was eventually cleared, but the experience β the arrest, the night in jail, the terrifying scene in front of his daughters β was not something he got back.
Police departments in at least 24 U.S. cities were using face recognition AI as of 2020. Most of those departments had no formal policy about how to verify an AI match before making an arrest. The technology moved faster than the rules designed to govern it.
Programmers have a saying that predates AI by decades: garbage in, garbage out. It means a system can only be as good as the information you feed it. But AI makes this problem worse in a specific way: AI systems take small patterns in data and amplify them into confident predictions applied to millions of people.
Imagine a hiring AI trained on 10 years of rΓ©sumΓ©s from a tech company where 85% of people hired were men. The AI doesn't know that this happened because of historical bias in who was recruited or who felt welcome applying. It just sees the pattern: men got hired. So it learns to rank male applicants higher. Amazon actually built and then scrapped exactly this kind of system in 2018 after discovering it was systematically downgrading rΓ©sumΓ©s that included the word "women's" β as in "women's chess club" β because the training data had taught it that women rarely got hired.
The bias wasn't in the algorithm itself. It was in the reflection of a biased world that the data captured. But the algorithm took that reflection and turned it into policy, at scale, applied to thousands of applicants.
When you read a headline like "AI hiring tool gives unfair results," most readers assume someone programmed discrimination in on purpose. You now know the more common truth: no one had to. The data carried the discrimination in silently, and the AI learned it without anyone noticing β until the harm was already done.
Here is the ethical question sitting at the center of all of this, and it doesn't have a clean answer:
If an AI reflects the world accurately β including the world's real inequalities β is that fair or unfair?
Think about it. A crime prediction AI might correctly learn that certain zip codes have historically had more arrests. But those arrests were partly the result of police being deployed to those neighborhoods more. So the AI is learning "where police found crime" not "where crime actually happens." The data is accurate. But accurate data from a biased system produces biased conclusions.
Some researchers argue you should "debias" training data β deliberately add more examples of underrepresented groups to balance things out. Others argue that artificially adjusting data introduces a different kind of distortion. Still others say the real problem isn't the data at all β it's using AI in high-stakes decisions like hiring and policing when we know these errors occur.
Who decides? How do you weigh accuracy against fairness? These are questions that companies, governments, courts, and researchers are actively arguing about right now β without agreement.
A city government wants to deploy an AI system to help decide which neighborhoods get priority for road repairs. The system was trained on 20 years of repair request data and resident complaint logs. The city says it's "objective" because it uses data instead of human judgment. You've been asked to audit it before launch.
Your lab partner has reviewed the technical specs. They're pushing back on your concerns. Make your case β and defend it against their challenges.
On May 14, 2013, a judge in Barron County, Wisconsin sentenced a man named Eric Loomis to six years in prison. The judge cited, in part, a score generated by a computer program called COMPAS β Correctional Offender Management Profiling for Alternative Sanctions. COMPAS had rated Loomis as high risk for reoffending.
Loomis appealed. He argued that he had a right to understand and challenge the score. The company that made COMPAS, Northpointe, refused to reveal how the algorithm worked, calling it a proprietary trade secret. The Wisconsin Supreme Court upheld the sentence in 2016, ruling that the score was "one factor among many." His case eventually reached the U.S. Supreme Court, which declined to hear it.
Meanwhile, journalists at ProPublica had been running their own investigation. In May 2016, they published a landmark analysis of COMPAS scores for over 7,000 people in Broward County, Florida. Their finding: Black defendants were nearly twice as likely as white defendants to be falsely flagged as high risk β meaning they were rated dangerous and then did not reoffend. White defendants were more likely to be falsely rated low risk β and then did go on to reoffend. The algorithm's errors were not random. They were systematically distributed along racial lines.
COMPAS didn't say "this person might reoffend." It produced a numerical score β typically 1 to 10 β that felt precise and authoritative. Judges, who are busy humans under enormous caseloads, received a number that looked like certainty.
This is one of the most dangerous properties of modern AI systems: they output confident-sounding results even when their actual accuracy is limited. A large language model β the kind that powers chatbots β doesn't say "I'm not sure." It generates the next most probable word, with the same smooth fluency regardless of whether the content is accurate or fabricated. A face recognition system doesn't flag its output as "uncertain." It returns a match with a percentage score that looks like proof.
The term for this specific failure mode is overconfidence β or in AI research, it's sometimes called miscalibration. A well-calibrated system says "I'm 80% confident" on things it gets right 80% of the time. A miscalibrated system says "I'm 95% confident" on things it only gets right 60% of the time. The system's stated certainty doesn't track with its actual accuracy.
In June 2023, a New York lawyer named Steven Schwartz submitted legal filings to a federal court that cited six specific court cases as precedents β Varghese v. China Southern Airlines, Shaboon v. EgyptAir, and four others. The opposing counsel checked the citations. None of them existed. Schwartz had used ChatGPT to help write the brief, and ChatGPT had fabricated six entirely fictional court decisions, complete with convincing case names, dates, and judicial rulings.
When asked about the cases, ChatGPT confirmed they were real. When asked to double-check, it confirmed again. It produced fake screenshots of fake database entries showing the fake cases. The model had no way to know it was wrong. It was generating plausible-sounding legal language based on patterns in its training data, with complete fluency and zero uncertainty. Schwartz was fined and publicly reprimanded. He later said he had not realized the AI could "just make things up."
What makes hallucination especially treacherous is that the false outputs are indistinguishable in style from accurate ones. A made-up court case looks exactly like a real court case citation. A fabricated scientific study looks exactly like a real one. The machine's confidence is constant; only the accuracy varies.
Schwartz was one lawyer using one AI system once. As of 2024, AI writing tools are used by millions of students, journalists, doctors, and researchers. The same hallucination problem exists in all of them. The question of who is responsible for verifying AI-generated content β the user, the company, or no one β has not been answered.
Both the COMPAS story and the hallucinating lawyer share a structural problem: no one was clearly accountable when the AI was wrong.
When COMPAS produced a racially skewed score, Northpointe said the algorithm was proprietary. The judge said it was just one factor. The appeals court said the process was legal. Eric Loomis sat in prison. When ChatGPT fabricated court cases, OpenAI said users are responsible for verifying outputs. Steven Schwartz said he didn't know. The judge fined Schwartz, not OpenAI.
This is what researchers call the accountability gap: a space where AI-caused harm happens but where responsibility is endlessly passed from the AI company to the human user to the institution that deployed it, until no one is left holding it. It is not unique to AI β corporations have exploited similar gaps for decades. But AI makes the gap wider because the errors can happen at machine speed, across millions of decisions, before anyone notices.
Here is the institutional reality that matters at your age: governments in the European Union began requiring AI systems used in high-stakes decisions β hiring, lending, criminal justice, immigration β to meet transparency and accuracy standards starting in 2024, under the EU AI Act. The United States, as of that same year, had no equivalent federal law. Different rules apply in different places, right now, to the same AI systems.
If an AI makes a decision that harms someone β and that decision was made by a system no one fully understands, trained by a company that keeps its methods secret, deployed by an institution that trusted the score β who is responsible? The honest answer right now is: we don't have agreement. That is a political and legal question as much as a technical one. And the answer being worked out today will shape how AI is used for decades.
A hospital wants to deploy an AI system that predicts which patients admitted to the emergency room are at high risk of sepsis (a dangerous infection that kills fast if untreated). The AI flags patients for priority treatment. Its overall accuracy is 89%. The company says that's better than the average doctor's unaided judgment.
Your lab partner thinks 89% is impressive and recommends approval. You're not so sure. Take a position and defend it.
In 2011, the Santa Cruz Police Department became one of the first police agencies in the United States to use predictive policing software β an AI called PredPol (later renamed Geolitica). The system analyzed historical crime data and produced maps of small geographic zones β sometimes just 500 feet by 500 feet β where it predicted crime was most likely to occur on a given day. Officers were directed to patrol those zones.
The logic seemed simple: if crime happened somewhere before, it might happen there again. Send officers, catch crime early. But researchers who studied PredPol deployments found something troubling happening over time. The zones flagged as high-risk were concentrated in neighborhoods that were predominantly Black and Latino. Officers sent there made more stops and arrests. Those arrests fed back into the historical crime data. The data then showed even more crime in those zones. The algorithm flagged them more aggressively. More officers were sent. More arrests were made.
The system had created a feedback loop. It wasn't measuring where crime was highest. It was measuring where police had gone β and then using that to decide where police should go next. The algorithm was chasing its own tail, and the tail was made of someone else's neighborhood.
In 2020, Santa Cruz became the first city in the United States to ban predictive policing, citing these exact concerns. The same year, Santa Cruz banned facial recognition by police as well.
A feedback loop happens when the output of a system becomes input to the same system. In music, it's that screech when a microphone gets too close to a speaker β sound goes out, gets picked up again, amplified again, louder and louder. In AI, feedback loops happen when an AI's predictions change real-world behavior, that behavior generates new data, and that data gets used to train or update the AI.
The predictive policing example is a clean illustration. But feedback loops appear everywhere AI is deployed at scale:
Content recommendation: YouTube's recommendation algorithm suggests videos you might like. The videos you click get recorded. The algorithm learns to suggest more like them. Over time, it pushes toward more extreme content because extreme content generates more clicks and watch time. The algorithm optimizes for engagement, not accuracy or wellbeing. Researchers at the Centre for Countering Digital Hate documented in 2020 that YouTube's algorithm served conspiracy videos to new users within 70 minutes of them creating a fresh account.
Credit scoring: An AI decides a neighborhood is high credit risk. People in that neighborhood get worse loan terms or are denied loans. Without credit access, some fall into debt or default. The AI sees more defaults from that neighborhood. It decides the neighborhood is even higher risk. The loop tightens.
In both cases, the AI is technically accurate within its own frame. It really is finding patterns. The problem is that it is finding patterns partly created by its own past predictions.
There is a newer version of this problem that researchers began documenting in 2023. As AI-generated text, images, and code flood the internet, future AI systems will inevitably be trained on data that includes large amounts of AI-generated content. Researchers at the University of Oxford and other institutions published studies showing that when AI systems are trained on their own outputs β or on data heavily contaminated with AI outputs β their capabilities degrade systematically. The technical term is model collapse.
Think of it this way: every time you photocopy a photocopy, you lose a little clarity. Do it enough times and the image becomes unrecognizable. When an AI is trained on text written by an AI that was trained on text written by an AI, each generation amplifies errors and narrows variety. The model begins to lose the richness of original human expression and instead converges on a smaller and smaller set of patterns β the most average, most common, most synthetic version of language.
This is not a hypothetical future risk. By 2024, researchers estimated that between 15% and 60% of content on major platforms was AI-generated or AI-assisted, depending on the type of content. The web that future AI systems will train on looks increasingly like a hall of mirrors.
When you use an AI to help write something β an essay, a social media post, an email β and that text goes online, it becomes potential training data for the next model. You are, in a small way, participating in shaping what AI learns next. This is not a reason not to use these tools. It's a reason to understand that the system is not static. It evolves based on what people do with it.
Feedback loops are not impossible to break. The key is independent measurement β checking the AI's outputs against sources of truth that are not themselves generated by the AI. For predictive policing, that would mean auditing arrest data against victimization surveys (which ask people directly about crimes, rather than relying on what police recorded). For credit scoring, it would mean regularly testing whether the model's predictions are self-fulfilling by looking at what happens when restrictions are lifted in flagged areas.
The uncomfortable reality is that breaking feedback loops requires time, money, and willingness to find problems. Companies and institutions that deploy AI systems profit from their outputs and have limited financial incentive to conduct audits that might reveal the system is making things worse. Third-party auditing β where independent researchers get access to AI systems and data β is widely recommended by AI researchers but resisted by most companies on grounds of privacy and intellectual property.
Here is the tension at the institutional level that matters: in 2024, the EU AI Act required companies deploying high-risk AI systems (including those used in policing, employment, and credit) to conduct regular conformity assessments. It is the strongest AI governance law in effect anywhere in the world. Researchers widely consider it insufficient. Companies lobbied hard to weaken it. The United States, China, and most other countries have no equivalent requirement. The feedback loops continue.
When a technology company says its AI is "continuously learning" and "improving over time," that sounds like a feature. You now know it can also be a bug. Continuous learning means continuous feedback. If the system's predictions are shaping the data it's learning from, "improving" might actually mean "becoming more certain of its own errors." The label means nothing without knowing whether the feedback loop is clean.
A social media platform uses AI to rank what content users see. You've been hired as an independent researcher to assess whether the system has created a feedback loop around health misinformation β false medical claims that keep getting recommended because they generate high engagement (outrage clicks, shares, comments).
Your contact at the company defends the system. They say the AI is just responding to what users want. You need to make the case that "what users click on" and "what users want" are not the same thing β especially when the AI itself is shaping what they click on.
In 2017, the UK government rolled out an automated system to assess visa applications from foreign nationals. The system was developed by Capita under contract with the Home Office. It used an algorithm to flag applications for closer scrutiny β or outright refusal β based on factors that immigration officers were never fully told.
One of those factors, journalists later revealed, was nationality. The algorithm sorted applicants by country of origin into three groups: green (lower scrutiny), amber (some scrutiny), and red (high scrutiny). Applicants from countries like Nigeria, Ghana, Pakistan, and Afghanistan were consistently funneled into the red stream. The Home Office called it a risk tool. Critics called it an automated nationality-discrimination machine.
In August 2020, after years of legal pressure from civil society groups including the Joint Council for the Welfare of Immigrants, the Home Office quietly scrapped the algorithm. No full technical audit of its decisions was ever made public. Thousands of people had had their visa applications affected by a system whose inner workings the government itself didn't fully understand β and refused to explain.
A black box is any system where you can observe the inputs and outputs but not the process that connects them. You put something in. Something comes out. What happened in the middle is hidden β sometimes because of secrecy, sometimes because the system itself is too complex to interpret.
Early AI systems β simple decision trees, rule-based expert systems β were what researchers call interpretable. You could open them up and read the rules. If a loan was denied, you could find the specific rule that triggered the denial and challenge it.
Modern deep learning systems β the neural networks behind facial recognition, language models, image generators, and most cutting-edge AI β are fundamentally different. They consist of millions or billions of numerical weights, adjusted through training across vast datasets. There is no single rule to point at. The decision emerges from the combined behavior of an enormous number of interconnected values that no human designed individually. Not the engineers. Not the researchers. Not the company's CEO. No one can look inside and say "this is why the AI decided this."
In November 2019, entrepreneur and programmer David Heinemeier Hansson (creator of the Ruby on Rails programming framework) posted on Twitter that Apple Card's credit limit algorithm had given him 20 times the credit limit it gave his wife β despite her having a higher credit score and their finances being completely shared. His co-founder's wife reported the same pattern. The story went viral.
Apple's financial partner, Goldman Sachs, issued a statement saying their algorithm did not use gender as an input. This was probably technically true. But the Apple Card algorithm also used factors like spending history, types of purchases, and other behavioral signals β signals that can reflect gender without the word "gender" ever appearing in the code. The AI had potentially learned that female spending patterns were associated with lower credit limits because those limits had historically been lower. Again: no one programmed discrimination. The historical reality of discrimination trained itself into the algorithm.
Here is the part that matters: Goldman Sachs genuinely could not fully explain to regulators exactly which combination of factors had produced any individual credit decision. The New York Department of Financial Services launched an investigation. The investigation could not definitively prove or disprove discrimination because the algorithm itself was opaque. No one could read it clearly enough to know.
The Equal Credit Opportunity Act has prohibited gender-based credit discrimination in the United States since 1974. But a law written for human decisions faces a hard question when the discriminating entity is an algorithm whose reasoning can't be decoded. If you can't read the algorithm, how do you prove it discriminated? This is one of the central challenges AI poses to existing civil rights law.
Researchers are working hard on what they call explainable AI (XAI) β techniques to produce human-readable justifications for AI decisions even when the underlying model is a black box. One common approach is LIME (Local Interpretable Model-agnostic Explanations), which tests how small changes to the input affect the output and uses that to generate an approximate explanation. Another is SHAP (SHapley Additive exPlanations), which estimates how much each input feature contributed to a given prediction.
These tools are valuable, but they have fundamental limits. They don't tell you the actual reason the model decided something β they tell you what factors appear correlated with the decision. That's different. It's like asking someone why they chose a restaurant and getting back "you looked at restaurants with good reviews and outdoor seating" β that describes a pattern, not a decision process.
Some AI researchers argue that the push for explainability is, in some cases, being used as a delay tactic β companies can claim they're "working on explainability" as a reason not to impose stronger regulation now. Others argue that for some applications β say, approving a mortgage or flagging someone for extra airport screening β you shouldn't deploy AI at all unless it can fully explain itself. The EU AI Act takes a middle position: it requires explanations for high-stakes automated decisions, but doesn't specify how thorough those explanations need to be.
The deepest tension is this: the most capable AI systems are often the least interpretable. The same properties that make modern neural networks so powerful β their ability to find complex patterns in vast data β are exactly what makes them impossible to fully explain. Capability and transparency may be in fundamental conflict. If that's true, then using the most powerful AI systems requires accepting that you can never fully understand why they do what they do.
That's not a technical problem waiting for a technical solution. That's a choice society is making right now about how much we're willing to trust systems we cannot fully see.
Should an AI system that cannot explain its decisions ever be allowed to make decisions that significantly affect someone's life? There are people who argue yes β the system can be accurate even if it's not interpretable, and accuracy should be what matters. There are others who argue no β the right to understand why a decision was made about you is a fundamental right, regardless of whether the decision turned out to be correct. Neither side has won this argument. It is being fought in courts, parliaments, and research labs right now. Knowing this puts you ahead of most adults who are affected by these systems daily without ever asking the question.
A state government is deciding whether to allow a new AI risk-scoring system in criminal sentencing. The AI company says the system is 92% accurate at predicting reoffending β more accurate than any human judge. They say the internal model is proprietary. They are willing to provide summary explanations ("the top 3 factors in this score were...") but not the full model weights or training data.
You are testifying that this is insufficient. Your lab partner represents the AI company. They argue that accuracy should be the standard, not explainability. Make your case β and respond to their counterarguments.