AI brain connecting to historical newspapers

Rewriting History: How AI is Reconstructing a Century of News and What It Means for You

"Delve into the groundbreaking project that's reviving a century of historical news using AI, offering fresh perspectives on the past and valuable insights for the future."


In an era dominated by instant information and fleeting headlines, the past often fades into obscurity. However, understanding history is crucial for shaping our present and future. Local newspapers, once the primary source of news, hold a wealth of information about the events, people, and ideas that have shaped our society.

The Newswire project is revolutionizing how we access and interact with historical news. By applying advanced AI techniques to millions of digitized newspaper images, this initiative is creating a comprehensive database of newswire articles from 1878 to 1977.

This isn't just about preserving old news; it's about unlocking new insights into our shared history. The Newswire dataset promises to be a valuable resource for researchers, language models, and anyone seeking a deeper understanding of the forces that have shaped our world.

Unearthing the Past: The Newswire Project's Ambitious Goals

AI brain connecting to historical newspapers

The Newswire project aims to reconstruct a comprehensive archive of newswire content. Historians have long recognized the pivotal role of newswires, such as the Associated Press, in shaping national identity and shared understanding. However, a comprehensive archive of this content has been missing – until now.

Researchers are employing a customized deep learning pipeline to process hundreds of terabytes of raw image scans from thousands of local newspapers. This ambitious undertaking is resulting in a dataset containing 2.7 million unique, public domain U.S. newswire articles, spanning a century of history.
Key features of the Newswire dataset: Georeferenced locations: Articles are linked to specific locations, providing a geographical context for the news. Tagged topics: Customized neural topic classification identifies the key themes and subjects covered in each article. Named entity recognition: People, organizations, and locations are identified and classified, providing valuable data for social and historical analysis. Entity disambiguation: Individuals are linked to their corresponding Wikipedia pages, providing a wealth of biographical information.
The Newswire dataset not only offers a vast collection of historical texts but also provides rich structured data that enhances its research potential. Library of Congress metadata is included, providing information about the newspapers that published the articles on their front pages.

A New Chapter in Historical Research

The Newswire project is more than just a technological achievement; it's a gateway to a deeper understanding of our past. By making this wealth of historical news accessible and searchable, the project empowers researchers, educators, and the general public to explore the stories that have shaped our world.

Newsletter Subscribe

Subscribe to get the latest articles and insights directly in your inbox.