Talkie Is a Vintage LLM AI Time Machine – Only Goes Back to 1930

13-billion-parameter model trained on pre-1931 public domain texts struggles with OCR errors and temporal leaks

Rex Freiberger Avatar
Rex Freiberger Avatar

By

Image: PICRYL

Key Takeaways

Key Takeaways

  • Talkie uses 13-billion parameters trained exclusively on pre-1931 public domain texts
  • OCR scanning reduces model performance to just 30% of clean data capabilities
  • Temporal leakage causes AI to know post-1930 events despite training cutoff

What if you could chat with AI that thinks the stock market crash just happened and radio is cutting-edge tech? Talkie, a 13-billion-parameter language model trained exclusively on pre-1931 texts, attempts exactly that digital time travel. While other AI companies face training data challenges, Talkie’s creators found an elegant workaround: they only used materials that entered the public domain in 2026.

Training Data From Before Television Existed

This experimental model consumed 260 billion tokens from a bygone era.

The team fed Talkie everything from 1920s newspapers to Victorian-era scientific journals, plus patents and case law—all safely in the public domain as of January 1, 2026. According to research that inspired the project, “The first humanistic motivation is time travel,” as Owain Evans noted about vintage LLM concepts. The 1930 cutoff wasn’t random; it leveraged the 95-year U.S. copyright expiration rule to sidestep the legal minefield plaguing other AI training efforts.

When OCR Meets the Jazz Age

Scanning century-old documents creates more problems than solutions.

Here’s where reality crashes the nostalgia party. OCR scanning of physical books and newspapers reduces Talkie’s performance to just 30% of what clean, human-transcribed data would achieve. Even with regex cleaning (boosting it to 70%), the model struggles with fundamental tasks compared to modern counterparts. Think of it like trying to have a conversation through a staticky 1920s radio broadcast—the charm wears thin when you can’t understand half the words.

Time Leaks and Fictional Cricket Matches

The AI occasionally knows things it shouldn’t—or invents things that never happened.

Despite careful filtering, Talkie suffers from temporal leakage,” somehow knowing FDR became president despite its knowledge cutoff. The creators acknowledge this data filtering remains imperfect. Worse, the live demo produces beautifully written but completely fictional historical events, like detailed descriptions of cricket matches that never occurred in 1882. Available on Hugging Face and GitHub, Talkie serves researchers better than consumers who expect accuracy.

This experiment reveals something fascinating about our relationship with AI and history. Whether Talkie represents the future of copyright-free AI training or just an expensive digital curiosity remains unclear. But in an era where AI increasingly feels omniscient, there’s something oddly refreshing about one that’s confidently wrong about the future—because it hasn’t happened yet.

Share this

At Gadget Review, our guides, reviews, and news are driven by thorough human expertise and use our Trust Rating system and the True Score. AI assists in refining our editorial process, ensuring that every article is engaging, clear and succinct. See how we write our content here →