Trying to track down that deleted tweet or verify what a website actually said last month? The Internet Archive’s Wayback Machine has served as your digital time machine for three decades, preserving over a trillion web pages. But this critical tool—the only public archive of its scale—now faces systematic blocking from major media outlets who fear their content might train AI competitors.
Twenty-Three Major News Sites Have Gone Dark
The New York Times, USA Today’s 200+ outlets, and Reddit have all cut off the preservation tool that journalists rely on.
According to Originality AI analysis, 23 major news sites currently block ia_archiverbot, the Wayback Machine’s web crawler. USA Today Co. operates over 200 media outlets, making its blocking decision particularly devastating. The Guardian takes a more subtle approach—allowing crawling but filtering archived content from public access. When you search for historical Guardian articles, you’ll hit digital dead ends.
Publishers Cite AI Training Fears
The New York Times claims archived content violates copyright law, though details remain murky.
Publishers justify blocking with two arguments: preventing AI companies from training on their archived content, and general anti-scraping measures. The Times stated that archived content is being used “to directly compete with us,” but declined to specify whether this represents documented violations or hypothetical concerns. USA Today Co. frames its blocking as routine bot prevention, though the impact falls hardest on preservation efforts.
The Irony Cuts Deep for Accountability Reporting
USA Today used the Wayback Machine to investigate ICE detention policies while simultaneously blocking the tool from preserving its own work.
Mark Graham, the Wayback Machine’s director, highlighted the contradiction: “They’re able to pull together their story research because the Wayback Machine exists. At the same time, they’re blocking access.” In 2016, the Archive exposed The New York Times quietly revising a Bernie Sanders article—exactly the kind of accountability work that becomes impossible when outlets control their own historical records.
Over 100 journalists, including Rachel Maddow, have signed a coalition letter supporting the Archive’s mission.
Your Access to Digital History Hangs in the Balance
If blocking spreads, early digital records could vanish into a corporate-controlled information void.
Graham warns that “the general locking-down of more and more of the public web is impacting society’s ability to understand what’s going on in our world.” No comparable public alternative exists to the Wayback Machine. If major outlets continue restricting access, you’ll lose the ability to verify claims, track editorial changes, or research historical context—creating information asymmetries where only large organizations control their own records.
The Internet Archive remains “in conversation” with blocked outlets, but resolution looks uncertain as AI copyright battles intensify across the industry.





























