In 2012, the Knight Capital Group lost $440 million in 45 minutes when a legacy code path โ one that senior engineers remembered deactivating years earlier โ was silently reactivated during a deployment. The institutional context for that code, including why it existed and why it was disabled, existed only in the memories of people who had since left the firm. No knowledge base. No annotated decision record. No review artifact linking the original rationale to the production system.
Knight Capital filed for emergency financing the next morning. The firm was acquired within weeks. The cost of not writing things down was measured in corporate survival.
Code review generates enormous quantities of decisions: why a particular pattern was rejected, why an exception was granted, why a security trade-off was accepted. Almost none of it is recorded in retrievable form. It lives in pull request comment threads that are never searched, in Slack messages that scroll away, and in the heads of engineers who move on.
Research published in the IEEE Transactions on Software Engineering (Rigby & Bird, 2013, "Convergent Contemporary Software Peer Review Practices") found that the median code review comment thread in large open-source projects was consulted again fewer than three times after the PR closed. Teams default to re-litigating the same decisions because retrieval is harder than re-discussion.
The problem compounds with team turnover. A 2022 survey by DX (formerly DevEx) found that engineers at companies with fewer than 200 developers spent an average of 4.1 hours per week recreating context that existed somewhere in the organization but could not be efficiently found. For a 30-engineer team, that is more than 120 hours weekly โ three full-time engineers โ lost to knowledge decay.
Google's internal engineering effectiveness research (Forsgren et al., "DORA State of DevOps Report 2023") identifies "documentation quality" as one of five predictors of elite software delivery performance. Teams in the top quartile for documentation quality deploy 208ร more frequently than the lowest quartile. The gap is not tooling โ it is captured knowledge.
Before building a knowledge base, you need to distinguish what is worth capturing. Review knowledge falls into three categories with different shelf lives and retrieval patterns:
Michael Nygard introduced the Architecture Decision Record (ADR) format in 2011 in a blog post titled "Documenting Architecture Decisions." The format is deliberately minimal: each ADR captures the context, the decision, the status (proposed, accepted, deprecated, superseded), and the consequences. Nothing more.
ADRs became widely adopted after Thoughtworks placed them in the "Adopt" tier of their Technology Radar in 2016. GitHub's engineering blog documented their adoption of ADRs in 2020, specifically citing the need to preserve reviewer rationale across team growth from dozens to hundreds of engineers.
The critical insight is that ADRs are written at review time, not retroactively. When a significant design decision surfaces during a code review, the reviewer who raises it owns writing the ADR. The PR does not merge until the ADR is committed alongside the code it describes.
The MADR (Markdown Architectural Decision Records) format, maintained at adr.github.io, has been starred over 4,000 times and is used as the standard ADR template by teams at Netflix, Zalando, and the UK Government Digital Service. All three organizations cite reviewer alignment as the primary driver of adoption โ not documentation completeness.
Your team has been code-reviewing a codebase for 18 months. You have 400 closed PRs, a Slack workspace with 3 years of history, and four engineers who joined in the last six months and frequently ask questions like "why does this work this way?" You have no formal ADR process.
Work with the AI to audit your team's current knowledge retention situation and design a lightweight capture system.
In 2018, Spotify's engineering blog documented what they called the "tribe knowledge problem." As the company scaled from 300 to 4,000 engineers, teams developed dozens of internal wikis, Confluence spaces, and Notion databases โ none of which were maintained. Engineers reported that searching internal documentation was less reliable than asking on Slack, because written content was frequently outdated or contradicted by newer decisions.
Spotify's solution was structural, not technological. They introduced what they called "golden path" documentation โ a small set of authoritative, actively maintained decision records that were reviewed quarterly. The key change was accountability: every active decision record had a named owner responsible for keeping it current. Stale records were automatically archived, not just marked as outdated.
The default failure mode for engineering knowledge bases is not malice โ it is the accumulation of small omissions. A page is created for a decision. Six months later the decision changes, but the PR author doesn't know the page exists. The page stays accurate-looking but wrong. Another engineer reads it, makes decisions based on it, and the error propagates.
Research by O'Reilly Media (2021, "What Engineers Know About Documentation") found that 68% of engineers reported finding documentation that actively misled them in the prior six months. Only 14% of engineers had a defined process for marking documentation as outdated when they discovered an error.
The problem is not a lack of writing โ most engineering teams produce enormous quantities of text. The problem is ownership. Documents without owners decay. The moment a document becomes everyone's responsibility, it becomes no one's responsibility.
A decision log differs from a wiki in one critical way: every entry is tied to a specific PR, time, and decision-maker โ not written as general guidance. The structure preserves the original context and makes staleness visible rather than hiding it.
The UK Government Digital Service (GDS) published their decision log template on GitHub in 2016 as part of their "How to document software architecture" guide. The template is a directory of numbered Markdown files (0001.md, 0002.md) stored at /docs/decisions/ in every service repository. GDS teams report that the file-in-repo approach โ as opposed to Confluence or Notion โ is what made the practice durable across seven years of team turnover.
A decision log entry without a PR link is almost useless. The PR provides the full context โ the code that triggered the decision, the alternative approaches considered in comments, the timeline. The decision log entry is the abstract; the PR is the full paper.
Effective cross-referencing works both directions. The decision log entry links to the PR. The PR description includes a standard line: "Decision log: see docs/decisions/0047.md." Many teams enforce this with a PR template field. GitHub's CODEOWNERS feature can require review of docs/decisions/ by a designated documentation steward whenever that directory is modified.
Netflix's engineering culture documentation (published 2023 on the Netflix Tech Blog) describes their use of "decision contexts" โ structured PR descriptions that require authors to link any decision that touches existing ADRs. The engineering platform team reports that this single requirement reduced repeated architectural debates by approximately 40% in the services that adopted it.
Etsy's engineering documentation (2019 Engineering Effectiveness Survey, published internally and partially at Velocity Conference 2019) found that the most durable documentation practice was assigning a named "documentation steward" per service โ not per document. One person responsible for all decision records in a service boundary reviewed monthly. Etsy found this reduced stale documentation incidents by 61% compared to shared-ownership models.
During a code review last week, your team had a significant debate about whether to use optimistic locking or pessimistic locking for a high-contention database table in your e-commerce checkout service. After 45 minutes of discussion, the tech lead made a call: optimistic locking, with a retry limit of three attempts, because the product team forecasted low contention 95% of the time.
No one wrote it down. You've been asked to create the decision log entry retroactively.
In 2020, the Backstage project โ originally built by Spotify and open-sourced in March of that year โ addressed retrieval failure directly in its design rationale. The team documented that Spotify's internal tools had accumulated over 2,000 service-level decisions that were effectively unsearchable because every team used different tag vocabularies. Searching for "auth" returned different results than "authentication" or "oauth", and there was no canonical taxonomy.
Backstage's "TechDocs" system imposed a controlled vocabulary of top-level categories that all service documentation had to use. Within 18 months of adoption, Spotify's engineering surveys showed that engineers could find relevant decisions in under two minutes โ down from an average of 23 minutes using the old Confluence structure. The taxonomy change, not the tooling change, drove the improvement.
Most teams that implement a decision log start by adding "tags" as a free-form text field. Within three months, the tags degrade. One engineer writes "security." Another writes "auth." A third writes "authentication-decisions." A fourth writes "sec-review." They all mean the same thing, but a search for any one tag misses the others.
This is not a hypothetical. Atlassian's internal study of Confluence usage patterns (2019, partially published in their "State of Teams" report) found that the average Confluence space developed a 4:1 tag synonym ratio within six months of creation โ meaning four different tags existed for every distinct concept. The result was that engineers abandoned tag-based search in favor of full-text search, which in turn returned too many results to parse.
The fix is a controlled vocabulary: a predefined, finite list of approved tags, with clear definitions, that every decision log entry must use. New tags require a proposal and team approval โ they are not added unilaterally.
Effective taxonomies for code review knowledge bases typically use two axes: a domain axis (what part of the system the decision affects) and a decision-type axis (what kind of decision it is). Combining both axes in every entry enables precise retrieval.
A well-tagged entry might read: tags: security ยท exception-granted ยท service-wide. A new engineer wondering "has anyone ever granted a security exception for this service?" can retrieve every relevant entry without knowing what keywords were used when the original decision was written.
For teams storing decision logs as Markdown files in a repository, a simple index file (DECISIONS_INDEX.md) at the root of the decisions directory enables fast retrieval. The index is a table with columns for entry number, date, primary domain tag, decision-type tag, one-sentence summary, and PR link. It is updated automatically as part of the CI check that validates new decision log entries.
Teams using GitHub can leverage GitHub's built-in code search across repository files, which searches Markdown content. The critical practice is consistent field naming โ using "Status:" as a field label in every entry means a search for "Status: Active security" returns all active security decisions. Inconsistent field naming makes even full-text search unreliable.
Amazon's internal engineering wiki practices (described in "Working Backwards," Bryar & Carr, 2021) rely on a similar principle: structured templates with fixed field labels ensure that even free-text content is findable. Amazon's "Correction of Error" (CoE) documents, which serve a similar function to decision logs in operational contexts, use identical field labels across thousands of documents specifically to enable cross-team search.
The DORA 2023 report identifies documentation findability โ defined as locating relevant guidance in under 30 seconds โ as a key differentiator for elite engineering teams. Teams that cannot meet this threshold for review decisions are classified as "documentation bottlenecked," a condition correlated with 2.4ร higher review cycle time and 1.8ร higher defect escape rate.
A well-tagged knowledge base becomes the most valuable onboarding artifact a team can produce. Instead of a static "new engineer guide," new team members can search for service-wide ยท pattern-choice and retrieve every foundational decision the team has made about how this service works. The reasoning is included. The alternatives considered are noted. The review date tells them how current each decision is.
Stripe's engineering blog (2021, "How we think about onboarding engineers") describes a "decision trail" onboarding approach where new engineers are given a curated list of 10โ15 decision log entries to read in their first week, not documentation pages. Engineers who onboarded using the decision trail approach reported being productive 31% faster than those who used traditional wiki-based onboarding in Stripe's internal survey.
You're the designated documentation steward for a six-service fintech platform handling payments, user accounts, fraud detection, notifications, reporting, and an internal admin API. Your team has 12 engineers and reviews roughly 60 PRs per week.
You need to design a tag taxonomy โ domain tags, decision-type tags, and scope tags โ that will work for the next two years without needing constant expansion.
In 2022, Zalando โ the European e-commerce platform โ published a retrospective on their engineering decision documentation practice in their engineering blog. Between 2016 and 2019, they had built a centralized ADR repository containing over 800 architecture decisions across 300+ engineering teams. By 2021, the system was effectively unused. The overhead of routing decisions through a central review process had created a bottleneck: engineers waited an average of 11 days for an ADR to be approved before merging code. Teams stopped creating ADRs.
Zalando's 2022 solution was federated ownership: each domain (roughly 10โ20 teams) maintained its own ADR repository with its own review process. Cross-domain decisions required only the affected domain stewards to approve, not a central board. Average approval time dropped to 1.4 days. ADR creation rate increased 340% within six months of the change.
The instinct at scale is to centralize: one knowledge base, one taxonomy, one review process. This instinct is wrong. Centralization creates approval queues that interrupt the flow of review work. It creates taxonomy committees that move slower than engineering teams. It creates a single point of staleness โ when the central knowledge base falls behind, all teams are equally lost.
The research supports federation. A 2021 study by Forsgren, Humble, and Kim (published in the "Accelerate" follow-up research) found that teams with locally owned documentation outperformed centrally documented teams on deployment frequency (1.9ร faster), change failure rate (41% lower), and mean time to recovery (2.1ร faster). The pattern mirrors microservices architecture: local ownership, agreed-upon interfaces, not shared mutable state.
Federated knowledge bases require agreed interfaces for cross-team referencing. When Team A's service depends on Team B's decision โ for example, Team B decides to deprecate an API contract โ Team A's decision log must be able to reference Team B's entry. The entry format must be stable enough to bookmark across repository boundaries.
The canonical solution is a stable URL scheme for decision entries: a permalink format that includes the organization, the domain, and the entry number. GitHub Pages, used by the UK GDS and several public-sector engineering teams, provides this by default when decisions are stored as numbered Markdown files in a repository with Pages enabled. A decision at org/domain/decisions/0042 is permanently referenceable from any team's record.
A knowledge base that only grows is a knowledge base that becomes impossible to navigate. Effective scaling requires explicit sunset policies: rules for when decision records move from Active to Archived, and what happens to dependent decisions when a foundational one is superseded.
Financial services organizations operating under SOX, PCI-DSS, or FCA regulations have a non-optional reason to maintain decision logs: audit requirements. The Financial Conduct Authority's SYSC 13 rules require documented evidence of material technology risk decisions for regulated firms. Several UK fintech firms (Monzo, Starling Bank โ per their engineering blog posts) use their ADR repositories as part of their regulatory evidence trail, explicitly citing the archive policy as a compliance feature.
A knowledge base that requires engineers to leave their review workflow to consult is a knowledge base that will not be consulted. The final scaling challenge is integration: surfacing relevant decisions at the moment a review comment is written, not as a separate lookup step.
GitHub's code review interface supports this via PR templates with embedded decision log links. A template field reading "Related decisions (check docs/decisions/ if touching auth, payments, or data-model)" surfaces the knowledge base at exactly the moment an engineer is forming a review opinion. The ask is not "go find documentation" โ it is "confirm you've checked the relevant decisions."
Linear, the project management tool used by many engineering teams, introduced "document links" in their issue view in 2022 specifically to solve this integration problem. Linear's engineering team documented (in their changelog) that teams which linked decision records to Linear issues saw 28% fewer repeated discussions on the same architectural topics in the following quarter.
The final integration point is onboarding. Stripe's "decision trail" approach (discussed in Lesson 3) scales naturally: new engineers receive a curated reading list of 10โ15 domain-specific decisions, filtered by the services they will initially work on. The knowledge base becomes the primary onboarding document โ not an appendix to it.
Your startup has grown from 8 engineers to 45 in 18 months. You have three product domains (Core Platform, Data & Analytics, and Customer Experience) with 3โ5 engineering teams each. You have a working decision log in a single repository โ 180 entries โ but teams are starting to complain that cross-team decisions take too long to approve and that the taxonomy is getting noisy.
You're being asked to present a scaling recommendation to the engineering leadership team next week. It must address governance, taxonomy, cross-team referencing, and sunset policy.