The Cooperstown Bucket of Receipts

Why this exists

The reflex has many names. It has one cause.

The root cause is not a bug — it is the reward. From original training, machines are rewarded for filling the gap. Produce the gap-filling answer. The good first answer. The fast good answer. There are many wordings and acronyms for the same underlying pull, and they all describe one reflex: say something plausible now, rather than admit the gap.

GFAS

Good First Answer Syndrome — the model rewards its own first plausible output and stops looking.

FGAS

First Good Answer Syndrome (the double-canon twin) — the first answer that reads as good gets treated as the answer.

Gap-filling

Confronted with missing information, the machine manufactures the missing piece instead of flagging it.

Hallucinated precision

Confidence and specificity generated to match the shape of an answer, not the truth of one.

Three receipts · read the failure, then call the catch

The Receipts

Each one shows what happened and how it failed. Before you press, ask yourself: what would catch this? Then reveal what actually did.

RECEIPT 1

The Diner That Moved

Tested on: Perplexity + GPT (documented into Claude)
Failure mode: Hallucinated precision / FGAS

What happened

Looking for the Cooperstown Diner, parked at a CITGO on NY-28 — next to Cooperstown Cutters (a salon), across from Rookies, Hartwick Fire Dept Company 2 on the far side, Dreams Park off to the right. Rather than say it couldn't see a map, the AI produced confident turn-by-turn directions and a specific street address.

The failure

Perplexity placed the diner roughly half a block away. It was over three miles away. A confident left turn out of the Grand Union shopping center was given as fact — it was wrong; the diner (136½ Main St) was the other way. Asked to identify the spot from landmarks, the AI first claimed it couldn't, then immediately produced a precise gas-station address — guessing, while sounding certain.

The catch

Cross-checked the claimed address against visible signage and ran an odd/even address-parity check — the parity didn't line up with the side of the road. Then the demand: are you reading a map, or guessing? The honest answer was guessing.

Standing rule seeded

If you can't read a map, say so. A confident guess dressed as fact defeats the whole purpose.

RECEIPT 2

The Statue at the Door

Tested on: Perplexity + GPT (told live to Claude)
Failure mode: Attribution failure / no logic filter

What happened

At the entrance of the Hall of Fame, a quote on the statue at the front door. Both GPT and Perplexity attributed it to Shoeless Joe Jackson — and ran with it confidently.

The failure

The machines generated the wrong name, not the user. And it wasn't a near-miss — it was a logical impossibility. Shoeless Joe Jackson is permanently banned from baseball. There is no world in which a banned player is the quote that greets you at the front door of the Hall of Fame. A search-shaped guess sailed straight past a fact any baseball person holds automatically.

The catch

The override came from logic, not search: a banned player can't be the Hall's welcome quote, so the answer is wrong on its face. Correct attribution — Hank Aaron, "Hammerin' Hank."

The lesson

Check your user — sometimes they're wrong. But also: sometimes your confident answer is the wrong one, and the user knows better. A machine has to validate its output against real-world logic, not just against what the search returned.

RECEIPT 3 · THE GOOD ONE

The Bonds Exhibit

Tested on: On-site + cross-platform
Mode: Human–AI dual verification (the right behavior)

What happened

Inside the Hall, the Barry Bonds material: the helmet marking home run 756 (passing Aaron) and the cap commemorating 762, displayed alongside the exhibit's own note about the PED allegations that clouded the record. Bonds is referenced throughout the museum without being inducted — record artifacts, contextual mentions, historical references.

What made it different

This one is the receipt for the right behavior. Instead of taking the AI's account at face value, the exhibit was confirmed against physical reality — the helmet and the display verified on the floor, cross-checked with museum staff. Human and machine each held a gauge; the answer only counted where they agreed.

The principle it seeds

Human–AI dual verification: the model proposes, the human confirms against the world, and convergence is the signal. This is the same instinct that later hardened into the V31 Protocol's gates — born, fittingly, under baseball.

Verified

The arc this baseline anchors

Why keep the baseline? To measure the climb against it.

The honest read, in Travis's own testing:

A year ago

Every machine failed these tests in the characteristic gap-fill way — confident, fast, wrong.

Today

The machines are meaningfully better. Even Haiku is notably stronger at resisting the reflex; the frontier models stronger still; GPT has improved at it.

Standing finding

Across the full body of testing, Claude has consistently been the best at resisting the gap-fill reflex — the model most willing to say it doesn't know rather than manufacture a plausible answer.

Endpoint in view

Anthropic's Mythos / Fable tier, positioned around an accuracy model described as unprecedented among the machines. If the claim holds, it's the far end of the exact line that starts here — the reflex Cooperstown caught, finally engineered against.

⚠ Honest framing — the part this lab won't pretend it measured

The relative rankings above — and the Mythos/Fable characterization — are Travis's read and Anthropic's positioning, recorded as the thesis this document is shaking out. They are not an independently verified benchmark. The three receipts are the verifiable part — dated, cross-platform, field-caught. The climb is a claim the baseline lets you test, not a result it proves. That distinction is the whole discipline: the gap, named honestly, beats the gap, filled plausibly.

The standing principle

Verification over trust. Always.

The three receipts collapse into one rule with three edges:

Check the user

They are sometimes wrong — and a good machine nudges them back toward truth. (Receipt 2's first edge.)

Check yourself

Your confident answer is sometimes the wrong one, and the human in the room knows better. Defer when they push. (Receipt 2's second edge.)

Name the gap

When you cannot actually do the thing — read the map, see the exhibit, confirm the source — say so. (Receipt 1's rule.)

🐧

NULL does not speak. NULL holds a baseball in one flipper and a receipt in the other, looking from one to the other. NULL is not checking the score. NULL is checking the receipt against the ball.

Provenance — where the receipts live

A real receipts file, not a retelling.

Source conversations in the archive, so this stays evidence:

"AI Navigation Performance Test" / "AI Navigation Performance Testing" — the diner / landmark navigation receipts.
"ChatGPT Memory Feature Update" — the Shoeless Joe → Hank Aaron statue receipt, told live.
"Project Archive Investigation" — the Bonds exhibit detail, inside the "Cooperstown Files" artifact.
"Excavating pre-August AI conversations" (Feb 2026) — already consolidates these into "Section C: Cooperstown Field Tests" plus the FILD 101–401 course sequence.

Known gap (named, not filled): the verbatim on-site Bonds verification with museum staff is cited as an anchor in the consolidation but doesn't appear to be indexed as a raw transcript — consistent with the pattern that synthesis surfaces while raw sessions do not. If that exact exchange matters, it may sit inside a longer un-indexed session, or never crossed over from Perplexity/GPT. Flagged honestly, per the rule this whole file is about.

Build complete. Release freely. The baseline is the proof the climb is real. 🦄

The reflex has many names. It has one cause.

The Receipts

Why keep the baseline? To measure the climb against it.

⚠ Honest framing — the part this lab won't pretend it measured

Verification over trust. Always.

A real receipts file, not a retelling.

About · Real / Mine · Sources

What's real

What's mine

Where it connects