Every scientific result leaves two records behind it.
One is the result as it actually happened: the instrument output, the analyses run and discarded, the parameters quietly tuned, the citations someone actually read, the argument over which finding to lead with. Call it the trace. The other is the result as it gets told afterward—the methods section, the clean reference list, the published claim. Call it the artifact.
For four centuries, science has kept the artifact and thrown the trace away. It was not a conscious choice. Keeping the artifact was somebody’s job—the journal's or the library's. Keeping the trace was nobody's. And so the trace died on reformatted hard drives, in notebooks in a basement, or in the head of a postdoc who would only tell you the lab’s instrumentation was unreliable if you caught him at a conference after the right number of drinks.
The artifact was the asset. The trace was exhaust.
That arrangement is ending. Two costs that held the old system in place are inverting at the same time, and the institutions that fund, govern, and rely on science are about to spend the next decade either getting ahead of the inversion or being dragged through it.
The first cost is the price of capturing the trace, and it is falling to nothing. When a cloud lab runs an experiment, it does so as a sequence of API calls. Reagents arrive barcoded; robotic steps log themselves with machine-level provenance; the experimental design is the program that executes it. A complete, time-stamped account of everything done assembles itself on a server as a byproduct of how the work now gets performed. The same holds true the moment an agent runs a statistical analysis or drafts a section of text: the chain of decisions exists, recorded, whether anyone meant to keep it or not. The trace used to be expensive to write down and impossible to maintain. Now, it is the cheapest byproduct of doing the work at all.
Meanwhile, the cost of opacity, of losing the trace, is climbing fast, because the population that depends on the scientific record has stopped being a closed guild and started being everyone.
A bounded community of specialists could absorb a missing trace. If a replication failed, it cost one lab weeks to months of wasted time; the people who got burned knew each other and knew which research groups to discredit or be cautious around.
That world is quickly disappearing. The parent of a child with a rare disease reads the primary literature directly at midnight. The clinician chooses a treatment plan from it. The translational biotech team builds a trial on a finding that the lab next door knows is fragile but has no structural incentive to flag.
Worse, the models that hundreds of millions of people now query instead of reading the literature sit downstream of every uncorrected error in it, laundering systemic brittleness into fluent, confident prose. The cost of a lost trace used to be contained by institutional friction. While determined outsiders have occasionally breached the archive, today there is significantly more demand to access it. The cost of a missing trace now radiates instantly out to patients, clinicians, and the language models that ingest the literature wholesale—outsiders who must rely on the face of the document because they cannot see what the specialists once knew by reputation alone.
There are countless reports of the artifact itself degrading under this strain. In May 2026, an audit of 2.5 million biomedical papers published in The Lancet by Maxim Topaz and colleagues revealed that entirely fabricated citations have spiked more than twelvefold in two years. In 2023, approximately one out of every 2,828 papers contained a fictional reference; by the first seven weeks of 2026, that rate skyrocketed to one in 277. These are not messy typos or broken hyperlinks; they are immaculate hallucinations. They feature real-sounding co-authors, plausible page numbers, and perfectly tailored medical terminology, pointing confidently to studies that were never conducted, published in journals that never printed them.
As agents write more of the literature, manufacturing a convincing artifact gets cheaper precisely as more weight comes to rest on it.
So the cost curves cross. Capturing the trace approaches free; losing it approaches catastrophic. The bill for this obscurity is landing on actors with the resources and the standing to act: research labs whose expensive training runs are poisoned by dubious, sometimes fabricated publications, regulators who must treat an autonomous biology lab's synthesis order as a biosecurity event, and funders who can no longer tell which of the results they paid for would survive an audit.
The White House Office of Science and Technology Policy (OSTP) framework requiring gene synthesis providers to verify customer identities and retain screening records for years is the leading edge of this. It exists because the cost of obscurity finally exceeded what those in power would tolerate. It will not be the last.
The deeper crisis is that the trust system science actually runs on was built for the side of history we are leaving behind. Journals certified that claims cleared a methodological bar. Societies tracked who was sound. Both assumed humans wrote the papers and humans read them, slowly, a few at a time.
That assumption crumbles when a robotic cloud lab can churn through a thousand automated protocols while three human reviewers are still trying to find a mutual calendar opening to discuss one manuscript. Peer review stops functioning as a filter in any meaningful proportion. When the work is fractured across cloud platforms, agentic analyses, and preprint servers, the traditional lab dissolves as a unit you can vouch for. The call might have been made by the postdoc, the software agent, or the platform, and next quarter’s paper under the same name will be an entirely different combination of the three. The senior scientist who once leaned across the journal-club table to say "be careful with that group" no longer knows the group. There is no group.
Science has never run on formal mechanisms alone. Underneath the p-values sits a thick social layer—reputation, lineage, the quiet word at a conference—that does the heavy lifting the formal record cannot, telling you which results to trust before you have verified a single one yourself. That social layer cannot survive the new scale of knowledge. It cannot vouch for an algorithm, it cannot reach a parent at midnight, and it cannot keep pace with a thousand experiments a week.
What is left exposed is a democratization of doubt. The parent at midnight and the principal investigator at her own bench face the exact same predicament: each can put a question to a model, each is handed a confident answer, and neither can establish whether it stands on bedrock or air. Deferring to the expert breaks down at exactly the point where the expert is herself a layman about what the machine in front of her has just produced.
To treat this as a moral failing—a problem of honesty to be solved by catching individual liars—is to misdiagnose the disease. It is a problem of resilience. It is the challenge of designing a system that keeps producing trustworthy output even though some of the people, agents, and instruments inside it are careless, or even deceptive.
Every mature, high-stakes field has had to learn this paranoia. Aerospace does not stay aloft because every parts supplier is virtuous; it stays aloft because the system is engineered to find the bad bolt, trace it to its source lot, and tolerate its failure in the meantime. Modern cybersecurity assumes the attacker is already inside the network. The defining move in both domains is the same: stop trusting the actor, and engineer trust into the architecture instead, in redundant layers, so that no single point of good faith is load-bearing.
Research has never been engineered this way, largely because its informal substitutes—the guild, the peer network—were the very good-faith assumptions that resilience engineering exists to remove. What replaces them is not a single structural fix, but a stack of verifications that begins with a simple question: What is inside this thing?
Manufacturing relies on a bill of materials; software, since systemic vulnerabilities taught it the hard way, increasingly ships with an SBOM—a machine-readable ledger of every component inside a build. Research has only the vestige of this function. Its reference lists are closer to prose courtesies than schematics. If a researcher wants to verify a finding, they cannot pull a dependency through a citation or query an appendix; they must reconstruct the chain by hand. The trace, captured routinely, is what turns that prose courtesy into a real bill of materials—an enumerable, machine-readable account of the reagents, runs, datasets, and prior results a claim is actually built from. This is the floor the architecture stands on.
But a bill of materials you take on faith is worth nothing; an agent that mints a record for free can forge one for free. How do I know that what is in this thing is what you say it is?
The inventory must be structurally attestable. This layer requires using the same open approach as the C2PA standard, which records a verifiable cryptographic signature on digital media at the moment of capture so that any later edit leaves a visible scar. Extended inward, the laboratory instrument signs its raw output the moment it is produced; the analysis environment signs each derived figure against the data beneath it; the author signs the final manifest. Each signature attests to the integrity of the one before it.
Several instrument vendors already do this within their own walls for regulatory compliance. The work ahead is to lift it out of those corporate silos into a shared standard—the way digital object identifiers (DOIs) and email became infrastructure owned by no one and used by everyone.
Still, what this buys is precise and limited: proof of origin and integrity. It proves the trace is the real trace. It does not prove the science is any good.
The final layer is the one even advanced fields are still climbing toward: verifying behavior. An instrument will sign miscalibrated data as faithfully as pristine data, and a fabricated result, cryptographically signed, is just well-attested fraud. Can I verify that this artifact does what it claims, and not something else?
In software, this challenge is met through two distinct frontiers: reproducible builds, which allow anyone to rebuild a binary bit-for-bit to verify it matches its source code, and formal methods, which mathematically prove a program satisfies its specification.
Research has only ever had the artisanal version of this: manual replication, rarely attempted, and almost never inherited by the next paper. Some of research can climb to an automated version of this layer and some cannot. A computational result can be re-executed against its own signed data to verify that the figures regenerate; a cloud-lab protocol can be re-run to see if the biological effect survives or vanishes.
Not every claim can be formally, mathematically proven true: an interpretive argument in history or literary criticism has no automated behavior to verify. It should not be forced to pretend it does. The depth of verification must scale with the nature of the claim—a phase-III oncology result and an exploratory pilot earning different, legible grades, the way a bracket on a wide-body commercial airliner and a bracket on a remote-controlled trainer carry entirely different structural certifications. Each claim needs a level of assurances that becomes something you can read on the face of the document, and access indefinitely.
To be clear: cryptographic process fidelity does not instantly validate a hypothesis. An immaculately logged experiment can still rest on a flawed premise. But the crisis of modern science is rarely a failure of pure imagination; it is a failure of visibility. When the full lifecycle of data collection—including the failed runs, the discarded samples, and the exact computational environments—is frozen in an unalterable trace, the gaps where bad science hides (selective reporting, p-hacking, and systemic friction) are illuminated. We do not need our tools to declare an ultimate truth; they only need to make error legible.
None of this requires getting rid of the scientific paper. The paper is a fine thing—a narrative account, written by people for people, of why a result matters and what it means. It can stay exactly where it is.
What it can no longer do is carry the full weight of trust alone, because a narrative built for human persuasion was never designed to be checked by readers and machines who will never meet its authors. The paper becomes the top, human-readable layer of the stack. The layers beneath it are what you verify against when the narration is not enough.
The defaults of this new architecture are being set right now, by people who would be surprised to learn they are setting them. Cloud labs are deciding what to log, what to expose, and what to seal behind proprietary APIs. Instrument manufacturers are negotiating signing formats. The builders of agentic research tools are choosing, mostly by omission, what their systems remember about their own reasoning.
Whoever shapes those defaults—the funders who can write provenance into grant conditions, the agencies that can mandate it the way biosecurity is already being mandated, the standards bodies that can make signed components the path of least resistance—is deciding what the next century of science can know about itself.
When the system works, the payoff arrives at the front where the human stakes are highest. The verification that used to happen only in a closed lab meeting—we tried to reproduce that group's result, be careful how you read it—could finally live in the record instead of dying in the room, so that the next lab inherits the warning instead of paying to relearn it. A regulator screening a synthesis order could follow a clear chain rather than trusting an affidavit.
And the layman doing the most consequential research of their life—the parent at midnight, the patient, the clinician at the point of care—could finally separate signal from noise. They could see whether a claim ever replicated, confirm that the citation under a recommendation points to a study that actually exists, and learn that someone's negative result had already settled the question, because the negative result would be preserved in the ledger instead of vanishing with the researcher who ran it.
There is a lab notebook or a hard drive somewhere with the original raw data and Western blots from a landmark paper whose figures turned out to be manipulated. It took sixteen years for the community to detect the fraud, and the notebook was never found; the case had to be made from compressed images scraped off an old website. Most decisions that ride on the scientific record cannot wait sixteen years, and they should not have to be reconstructed from what survived by accident. We are in a rare position to keep the trace instead of the wreckage. What remains is to decide that keeping it is someone's job.
This is part of a series where I try to probe the timeliness of transitions happening in our systems of research, don't take it too prescriptively! Some of the thinking in this piece was informed by conversations at a recent workshop organized by Cambridge University Press, NISO, and COUNTER Metrics, and with many others who are building in this field.
To subscribe to the newsletter, fill out your email at the bottom of the page.