Emulation progress through collaboration
From the start of SUP’s digital publishing initiative, and even more explicitly in this second grant phase, the longevity of the work we produce and publish has been a high priority. The ephemerality of web-based content is (in?)famous, but with scholarly communication’s entry to the medium, it’s become increasingly important to solidify a means of persisting the kind of digital content we and other university presses are now publishing. Fortunately for producers and disseminators of scholarly communication, many specializations and areas beyond publishing share the urgency of these endeavors for their artifacts—from environmental data stores to news sites to political and historical records. In recent years, there has been a noticeable surge in the attention preservation entities are giving to digital and web-based content as evidenced by the ubiquity of web archives and the accessibility of personal archiving tools. Even more promising are the collaborations that have been forged in an effort to assess the needs of content producers and advance the offerings of preservation services in response to those needs.
One such collaboration that SUP’s digital program contributed to in 2020 and reciprocally benefited from was the work conducted through the Mellon-funded project “Enhancing Services to Preserve New Forms of Scholarship.” SUP’s role in that project was to supply use cases to facilitate efforts by preservation service participants Portico and CLOCKSS to outline and test potential solutions for longterm access to publications whose digital formats present significant challenges to typical preservation processes. While the larger results of that project will be published separately by the project leads, we wanted to share here some of the ways the experience has impacted our ongoing preservation efforts at SUP.
Of the three preservation and archiving pathways we currently use or are testing (web archiving, emulation, and digital repository), the work applied to our use cases by CLOCKSS and Portico saw further development of or testing of two of those—web archiving and emulation, both of which afford the most high-fidelity preservation versions of our dynamic publications.
The web archiving testing went as expected. Thanks to the hard work of Thib Guicherd-Callin and Fei Li of LOCKSS at Stanford Libraries, we were able to zero in on a persisting problem surrounding embedded maps. While it might be possible, with a lot of manual work, to limit the frames and layers we wanted to capture from each map, it’s not feasible to capture every possible permutation a reader might expect to be able to call up in the live publication. We were happy to provide use cases for the preservationists to dig into those barriers as well as those presented by still evolving web archiving technologies. Our partnership with Webrecorder has already proven this approach is not as cut-and-dry as we had once assumed, and it took a lot of work for us at SUP to get to where we are now with a nearly full collection of our publications’ web archives.
Our emulation efforts, on the other hand, got a significant assist from the work of Karen Hanson at Portico during our sprint. In fact, she was able to dismantle some of the more technical barriers I had encountered in my own testing efforts over the previous year. Thanks to her work and the collegial spirit of the project, we now have a self-contained disk image of one of our publications, Filming Revolution, that we can load into the Emulation As A Service framework, yet another grant-funded pilot SUP has been involved with over the past two years. Filming Revolution, though still very much alive in its hosted form online, had posed challenges for emulation in that it calls in external media assets from Vimeo. While we hold the raw files at the Press and have accessioned them into the Stanford Digital Repository for longterm preservation, the full fidelity publication–the “book” that lives on the web, complete with its embedded video–relies on Vimeo for its full functionality. The process of relinking these files to source instead from a local directory was a task Karen was able to execute in her experimentation with the work. Additionally, she was able to identify database lines in the backend that should be anonymized for responsible archiving, and the package that now lives on a virtual machine is more secure and self-contained thanks to her time and talents.
While the features that required editing for the sake of the archived copy are strongpoints of the live project—they allow users to engage interactively with the content—they benefited from the adjustments performed to ensure a more endurable archive for future researchers. And although the emulated version will itself require maintenance and support of the emulator running it, it serves as yet another alternative format for future researchers, one that may outlast the formats currently comprising the Web.
With a much more solid disk image, thanks to Karen, I was able to successfully spin it up in Stanford’s test instance of EAASI. Previous attempts at this were partly successful, but since the previous disk image was incomplete it was never 100% viable. Emulation itself is still a pilot project at Stanford, too, so this is by no means a conclusion to our testing of emulation as an effective or efficient solution for preserving these works, but it’s allowed us to complete one full test, from building a virtual machine to populating it with the project files to loading the full package into an emulation framework. It involved a lot of specialized skillsets and intense collaboration, proving that the challenges of emulation as a solution require pooling resources, and preservationists and publishers understanding each other’s needs and limitations.