New digital mining tool opens door to historical data
York researcher Colin Coates, director of the Robarts Centre for Canadian Studies, is part of an international project team mining thousands of pages of text and images to trace patterns and environmental consequences of early industrialization in the 19th century.
For the past two years, Coates and Jim Clifford (PhD ’11) have worked with a team of text mining and computer visualization experts from the University of Edinburgh and the University of St. Andrews in Scotland, as well as York University, to develop a geographic database. They launched the database this week showcasing early results.
“The key goal from our perspective was to create a tool for research that would be available to a range of historians interested in the economic and environmental history of the 19th-century British world. With this digital history initiative, historians are able to explore new historical questions and make broad comparisons using vast amounts of data in ways that previously were not possible for individual researchers,” said Coates.
The process involved programming computers to read documents, in this case more than 10 million pages of 19th-century documents, looking for mentions of commodities and their locations. This allowed the researchers to explore the process of globalization that was well underway in that era.
“We used computers to read the historical documents and then extracted a very large database that allows us to explore every location mentioned in the same sentence as the commodities we were interested in, like coal, rubber, wheat, cinchona or cotton,” said Clifford, a York alumnus, now a professor at the University of Saskatchewan. “Combining text mining with engaging visualizations makes this database unique and enables us to look at history from a different perspective than previous studies.”
The project, called Trading Consequences, charts the commercial growth of the British Empire. It details the economic and environmental impact of extracting and shipping hundreds of different commodities. The database includes well-known commodities like sugar, coffee and tea, but also includes some largely forgotten raw materials like gutta-percha, used to insulate telegraph cables before the invention of plastics. Anyone interested in this topic can explore the results through a series of purpose-built, web-based visualizations.
The two-year project forms part of the second round of the Digging into Data Challenge and one of eight international projects that includes Canadian researchers. Funds are provided by the British Jisc, Economic & Social Research Council, Arts & Humanities Research Council and Social Sciences & Humanities Research Council of Canada.
To learn more, visit the Trading Consequences project website.