A project in preservation: archiving digital documents, articles and images

concept of digital technology

An important aspect of YFile‘s 20th anniversary is preserving its archived content for future generations at York University. What does the process involve when considering a digital publication with such a lengthy history? What should University community members consider when thinking of archiving their digital records?

To find out, YFile turned to the archivists at the University’s Clara Thomas Archives & Special Collections Department of the York University Libraries (YUL). They are contributing their expertise to the YFile archiving project, which is of extraordinary proportions with more than 26,000 electronic posts in the publication and three accompanying binders with hundreds of images backed up on compact discs (CDs). Sitting down to answer our questions are Michael Moir, University archivist; Jennifer Grant, archivist and Nick Ruest, digital assets librarian in the YUL.

Q. Why is it important to archive electronic newsletters, images and blogs such as YFile?

A. Unlike the minutes of Senate or other official University records that the archives is mandated to preserve for the long term, content like YFile documents a broad range of activities and news relating to all members of the York community – students, staff and faculty – with an eye to highlighting the people who make the University what it is, in all its multi-faceted complexity, packaged in an easily consumable, multi-media format. So, it’s important to preserve records that capture not only the content of information communicated from the University to its communities, but also the ever-changing format of that communication. For example, in the past, this type of information may have been disseminated through analog methods like newsletters or newspapers (published weekly, monthly or quarterly), reports, departmental memoranda, etc., which were easy to store and collect for the longer term. The relative ease and frequency with which information can now be shared means that there is not only much more content out there to keep track of, but it also requires concerted effort to capture and preserve for the long term, in large part because electronic media is much more fragile and ephemeral than similar information products that once were paper based.

Libraries atrium
The Clara Thomas Archives & Special Collections Department is located in the Scott Library at the Keele Campus

In the archives, we can no longer expect this type of information to just come to us like once would have happened (when a community member would have boxed up their full run of departmental newsletters and reports, for example, and transferred them to us). Instead, we all (content creators, archivists and digital preservation managers) have an active part to play in ensuring that digital content being created now, which documents the activities and functions of the University in the present moment, can be easily accessed in the future to ensure that the historical record accurately reflects how the University operated in the early 21st century.

Q. What does the process for the experts in the Clara Thomas Archives & Special Collections to handle the YFile materials?

A. Like with all archival donations and transfers that come to the Clara Thomas Archives & Special Collections, our focus is a dual one. Our first priority is to ensure that these materials can be safely and easily accessed. In this case, this means making copies of digital files without altering their contents and documenting what is on each photograph CD, capturing file titles and all the technical metadata that is available embedded in the CD, and creating checksums (used to verify the integrity and uniqueness of individual files). Once we’ve done this, the process begins to figure out what that content actually is and how to describe it so it can be discoverable to researchers and users (and us) via our descriptive database. During this process, we will want to match images to their corresponding issues of YFile and weed out duplicates.

Q. How long do you envision the project to archive YFile will take?

A. There is now a bi-monthly web crawl of the YFile site that takes several days to complete. This is an ongoing process. The management of the content that has been transferred via CD is a bit more complicated. As with many archival projects, the timeline will depend entirely on the availability of archives staff to do this work. In some ways, the first step, which is to copy and then transfer content from CD to server, will be the most time consuming, as it involves working with one CD at a time on a computer terminal that still has a CD drive, creating a manifest for each CD that documents its contents and creates checksums for each digital file (which will allow us to compare files to identify duplication), and then transfer these copies to a server for temporary storage. Once these digital files are backed up and more easily accessible, the appraisal work begins to determine what needs to be kept, then work to create descriptions of content that can be added to our archival database can begin, followed by transfer of the objects and related metadata to digital preservation platforms. There is no way to say for sure at this point how long this might take, but six months would be an optimistic guess, given our many competing priorities.

Q. Aside from the YFile website, where can University community members access the archived YFile materials?

A. York University Libraries has an instance of OpenWayback available here. For those unfamiliar with OpenWayback, it is the open source version of the Internet Archive’s Wayback Machine. Previous captures of the YFile website can be found here.

Q. Looking back, what should YFile have done with respect to archiving of its newsletters and content? What can University community members learn from the YFile experience?

A. Hindsight is always 20-20 and we all often must do the best we can with the resources available. The first step is always recognizing that something is worth saving and preserving. Once you take that important step, the next necessary step is finding allies to help you do this work. In this case, reaching out to York University Libraries will help to gain those allies and then get access to our resources and knowledge of best practices.

Q. In the last 20 years, the way we store and share information has evolved significantly. What are some methods of archiving that remain important today, and what are some of the methods that we have left behind?

A. This is a large question! I think the principles of archival practice have not changed, but the methodology and timelines for this work have. Archives still are responsible for ensuring the authenticity, preservation and accessibility of records of enduring value, and our methods for doing this work continue to change with the evolution of record formats and the progression from analog to digital. The bigger issue, however, is that people are producing these records at a rate not comparable to any moment in the past. We also don’t necessarily think about the digital records we create in the same way. Email replaced the letter, but one could argue that direct messaging and text messaging have replaced email – yet we don’t think of preserving our text messages in the way that we think of letters or even email. So, I think that information professionals have a much larger role to play in intervening early in the life of the record, in the lives of potential donors to archives (such as records creators in a university context), to ensure that we identify and then capture and preserve important information objects for future transfer to the archives. The consequences of not doing this will mean that important information either does not survive, or, if it does, it may be lost or hard to discover amidst gigabytes and terabytes of unorganized digital files. Successful archival practices will depend on archivists and records creators working together at an earlier stage in the lifecycle of the record to ensure the survival of important information, which is definitely a change from past practices where the archives would wait for the analog records to become inactive and then our work would start when those records entered our physical custody.

Q. What are the risks associated with not archiving electronic materials properly?

A. There is no such thing as a complete archive of anything, so even under the best circumstances, we never save everything we should or could or have perfect record-keeping practices. However, one of the biggest challenges in preserving digital records is accounting for the real specter of technical obsolescence. Not preserving your digital records properly in a basic sense really means not ensuring that digital files of value continue to be authentic and accessible when the media that houses it, or the software that opens it, or the hard drive that it lives on become obsolete or unworkable. This is already a challenge for archives that have digital records transferred to us that are either difficult or impossible to access because of these issues. One of the main things that digital records creators can do is be aware of the need to safely migrate forward their digital records to new hardware or storage media while it is still in active use, and if you don’t have the tools to do this yourself, figure out who can help you. We have yet to really understand what impact the transition from an analog to digital world has had on documentary heritage and the historical record in general.

Q. Looking forward, what should University community members consider for their electronic documents, images and newsletters? How should they contact you and what would you need from them?

A. The York University Common Records Schedule applies to both analog and digital records, so the first thing that York University staff and faculty should do is figure out what their obligations are for corporate recordkeeping by using this as a guide. Another piece of advice is to be proactive with your digital recordkeeping, whether they are records to be kept for the long term or for the short term. Think about what information will help future users of this content understand and use these records. One of the easiest things you can do is be consistent about file naming conventions and file formats that allow discoverability and access to digital content. Don’t wait until you’re in a recordkeeping crisis to reach out to us – collaborative problem-solving and information sharing is essential to the proper management and stewardship of digital records for long-term preservation. That said, if you don’t know what something is and why it’s worth keeping, don’t assume we will either – we rely on records creators to accurately identify and name digital records to make our work possible.

Read more of YFile‘s special anniversary content at go.yorku.ca/yfile20.