Preserving the interactive scholarly works we’re publishing continues to be a high priority within the scope of our initiative. Over the past year and a half, the possible solutions have become clearer, though implementing the different methods is still one of our bigger challenges. So far we’ve identified and begun pursuing three specific strategies for ensuring the projects we’re publishing remain part of the scholarly record for as long as technologically possible.
Digital Repository and Documentation
As a university press we have the luxury of access to our institutional library’s resources. And since part of any university library’s mission is to preserve and provide access to scholarly material, it makes sense to leverage this resource as much as possible for the work we’re producing. The Stanford Digital Repository (SDR) allows us the ability to store electronic material–like the files comprising a web project–in a safe and persistent environment. And with dedicated experts in the fields of metadata and discoverability, we can not only store but describe and document each project as an archived collection, ensuring the content and code are findable and viewable by researchers for years to come. It’s perhaps the most durable long-term approach, but it has its drawbacks.
So far we have tested this approach with only our first publication, Enchanting the Desert. The process is nearly finished after more than a year (check back soon for a full report), and having completed it once, we know what to expect and how to improve the workflow for our next archive.
The limitations of this resource include but are not limited to its inability as of now to render a high-fidelity version of the project. In other words, a reader cannot experience the interactive interface but must instead use the library’s discovery mechanism to sort through disaggregated pieces of content that are not displayed in their original context or fidelity. This means we need to supply para-text, like technical and descriptive documentation and screencasts to ensure a researcher has a clear idea of what the project actually looked like before diving into its discrete un-contextualized parts. The SDR is a safe and enduring place to keep the project’s components, but it cannot yet present the original experience of reading and interacting with the cohesive presentation of the work. It also relies on a lot of specialized skills within the library, from metadata to rights-and-permissions to discovery-platform specialists.
Web archiving seems so far to be the most direct and efficient approach to providing readers with a replica of a project’s content and fidelity. To the average reader, there may even appear to be no difference between a web-archived and a live website. It essentially delivers the experience of reading and interacting with the project, but it does not provide access to any of the underlying code or software for those interested in unpacking the guts of a project.
As discussed in several previous posts, we are already actively web archiving all of our projects. I manually archived our first publication using Rhizome’s Webrecorder tool. The web archive file (WARC) is stored in the Stanford Digital Repository, but because of current limitations to the system the delivery platform, it can’t be viewed directly through the repository’s interface. Instead, interested readers and researchers must access it through Webrecorder’s live hosted environment or download the WARC from the repository and feed it into the downloadable Webrecorder Player to view it locally offline.
We’ve also worked with Webrecorder to capture our second publication which was built using the Scalar platform. This process could ostensibly apply to other Scalar projects in the pipeline, so our work with Webrecorder, another Mellon-funded resource, has proven to be incredibly useful.
Yet the challenge comes in maintaining the web archive itself. The live web archives of both publications are hosted and maintained by Webrecorder while the WARC stored in SDR is not currently readable within that environment. For now, because of the need to entrust the maintenance of the tool and thus our working web archives to an outside organization, this approach is best viewed as a valuable but likely short-term preservation strategy. Because we’re not in direct control of it, we can’t be completely sure the web archive will far outlast the live publication itself without sustained interaction and collaboration.
A strategy requiring a robust infrastructure, emulation is perhaps the most complex yet potentially the most thorough approach to preserving not just a project’s content and fidelity but also all its dependent software and underlying code. With emulation, the project content and all the software needed to deploy it–from content to browser, server, even possibly operating system software–is stacked up and hosted remotely and then delivered to readers online. The user is then able to interact directly with the project through a window within their own web browser. This is useful when projects are built with frameworks and code libraries that will become un-renderable on future machines or when projects are best viewed in specific browsers no longer available to the average user. Gaming enthusiasts have been working on emulation for years–setting up systems for fans to play old Atari or NES games online for example–but the same technology could and should also apply to the scholarly record. However, the infrastructure needed to run an serve emulated environments remains categorically beyond the scope and expertise of a publisher, so we’re looking to our institutional library and other established programs for help with this one.
We currently haven’t tested this method on any of our projects, but that could soon change.
The Mellon-funded Emulation As a Service Infrastructure, headquartered at Yale and powered by OpenSLX, has invited Stanford Libraries to become one of several host nodes for setting up and testing emulation as a potential solution for the preservation and delivery of digital collections and institutional holdings. Two of the project leads, Jessica Meyerson and Klaus Rechert, attended the preservation workshop we held in May and set in motion the collaboration, which SUP is now helping to support through the provision of use cases. It will be important for our own program to continue to support this project in any way we can, and we’re hoping the endeavor will prove useful for eventually being able to provide a high-fidelity user experience of projects that may not survive inevitable web and browser updates in coming years. While emulation requires a lot of resources, it makes sense to work with our own library as well as fellow Mellon awardees who are already looking at this issue.
More to Consider
Further possibilities involve looking at containerization as a means for porting projects through peer review, migration, and preservation environments. The Mellon-funded Digits program is looking into just how to do this, and we’re looking forward to watching their progress. Containers already play a big role in web-archived projects, and whether or not they can also be used to facilitate an emulated version of a project is something we’re thinking about and looking forward to exploring further. Ultimately, a real solution will likely require a combination of strategies working in tandem.
While each approach offers its unique advantages, none is a complete solution on its own. Identifying these potential solutions is only the first step. We’re focused on exploring each option and measuring its viability for the specific needs of the work we’re publishing. In the time it’s taken us to zero in on these three strategies, we’ve already seen improvements in the technologies and infrastructures of each one. It’s encouraging to think that as we get closer to implementing a particular approach, it’s already becoming more reliable. And no doubt other potential solutions are waiting on the horizon. I’ll be at a meeting in a couple of weeks that will see representation from organizations currently serving the preservation needs of books and journals, and it will be interesting to hear how they’re thinking about the growing market for digital publications. It’s certainly becoming a format that neither publishers nor preservationists can ignore, and hopefully documenting the work we’re doing now will help others as they join in the pursuit.
Jasmine Mulliken is Production and Preservation Manager, Digital Projects, at Stanford University Press. She coordinates the production and preservation workflow of born-digital projects, including recommending platforms and coding standards to authors, consulting with authors on projects’ technical attributes, and evaluating best practices for archiving and preservation.