Completing the Archives, or How We’re Extending the Life of Web-Based Digital Scholarship
We are excited to announce a milestone in our archiving efforts, which have been in careful development since soon after Stanford University Press’s Mellon-funded digital publishing initiative began.
Two of SUP’s seven digital web-based publications have now been fully archived, and the public-facing archive packages have been integrated into each publication’s landing page. Visitors to the landing page of these live web publications can now click into an “Archive” page where they can read about and select from the various archive versions of the project. These versions include a web archive, a digital repository collection, project documentation, and (when available) an emulation. While I’ve described all these archiving solutions in other posts as we’ve explored, tested, and developed them, it’s worth summarizing the final formats here as they’re represented in the packages we’re now making public.

The web archive version of each publication most closely matches the live version in that readers are able to navigate independently through the publication, interacting with all its features–whether they be maps, 3d environments, videos, pop-ups, or just plain hyperlinks–in the same way they would in the live version. If you’ve ever viewed content on the Wayback Machince, you’ve viewed a web archive. The difference between the Wayback Machine and these web archives, though, is that ours have been custom created by carefully crawling (and manually patching when necessary) every aspect of the project to ensure there are no dead ends or holes where media should be. The archive is then served through a playback system, which is nearly invisible, on a simple html page. Thanks for these web archives go to Webrecorder, who, as a sub-awardee of our current Mellon grant, was able to meet the specific challenges of our complex and highly dynamic web publications. There are only a handful of web archiving tools and solutions out there, and so far, the Webrecorder suite has been the only one that can address the unique needs of these innovative publications. The addition of full-text search for the web archives has also set Webrecorder apart as a tool especially amenable to scholarly content. In addition to the ever-astounding talent and problem-solving skills of Ilya Kreymer of Webrecorder, thanks are due to Anna Perricci, Nicholas Taylor, Thib Guicherd-Callin, and the wider web archiving community who have shared advice and welcomed my perspective as I’ve attempted to learn their trade and sought to introduce it to scholarly communication in more meaningful ways.
The Stanford Digital Repository collection for each project contains its code and media assets, the file package that serves the web archive, a disk image for emulation when available, and the documentation (described below). Each file and asset has its own purl (persistent url) in the Stanford Digital Repository, and the record for each item provides metadata unique to its publication context. Since each project is unique, each project’s collection varies in terms of its contents, formats, and structure. To aid in the discovery of the complete collection, a separate page lists and links to the contents and provides info on access and organization. While this archive doesn’t provide the full-fidelity, in-context experience of the original publication, it does provide preserved copies of the code and media, making it an ideal archive for a researcher interested in delving into the technical side of the project or even reconstructing it once a live or web-archived version is no longer available. Certain components of the repository collections will remain dark while the live web publications remain functional. As time and evolving technologies render the live web project obsolete, more of the content in the collections will become publicly available. Aside from our own team, thanks go to the Stanford Libraries’ Digital Library Systems and Services (DLSS), especially Hannah Frost, Geoff Willard, Arcadia Falcone, Andrew Berger, and Ben Albritton for their support, documentation, troubleshooting, and guidance as I’ve learned and continue to re-learn the SDR systems.

Documentation provides an overview of the project’s technical specifications and requirements as well as descriptions of the project’s structure and dependencies, information that can aid future researchers in piecing the contents of the repository collection together for potential redeployment. At the heart of the documentation is a narrative description, written by the project’s author, of the publication’s features, navigation, and overall user experience. It includes not just a written description but also a screencast video where the author takes readers through a tour of the project, providing them a walkthrough to simulate the experience of engaging with the project. Both documentation and screencast represent a future-proof archive of the project’s look, feel, and functionality, and the commentary provided by the author for these artifacts provides a summary of the content and argument at the center of the works. Such documentation addresses the ethereal and performative nature of web content and may very well outlast not just the live publication but also the other archive versions which are themselves dependent upon the technologies that serve them. The documentation requirement and structure was heavily inspired by Dene Grigar’s DHSI course “Documenting Born Digital Creative and Scholarly Works for Access and Preservation,” and thanks go to her for her continued work in the field of documenting born digital literature.
So far, an emulation has only been developed and tested for one of the projects. The disk image of this project, which can be loaded into the Emulation As a Service framework, is stored in the SDR collection, but is currently dark until the EAAS system is out of pilot stages and more readily usable by the public. Efforts continue with that project, and I’ll be posting more about it as they continue to develop. The goal is to eventually be able to link directly to an emulator on the web that’s pre-loaded with the disk image and documentation on how to use it. For now, we will be providing info on the archive page for projects whose emulated versions have been prepared, and we will add links to the emulations as they become available and necessary as stand-ins for other high-fidelity versions. Thanks for progress on the emulation front goes to the Software Preservation Network and the EAASI team for support and development of the emulation framework; Michael Olson for his leadership in the EAASI pilot project at Stanford Libraries; Jonathan Martin’s guidance on creating the disk image in his DHSI course “Creating LAMP Infrastructure for Digital Humanities Projects;” and Portico’s Karen Hanson for her much welcome revision and further development of the complete disk image that we are now able to spin up in the emulation environment.
Archive packages can currently be viewed for When Melodies Gather and Filming Revolution by clicking the “Archive” link on the cover page. Archives for the other five publications are in progress and will be released as they are completed. These packages all represent years of work researching solutions, developing workflows, building professional relationships, and collaborating with and contributing to the dynamic initiatives that are emerging at an exciting rate. There is still much work to be done, but in the four and a half years I’ve been learning and doing these various forms of archiving for web content, I’m enormously encouraged by the energy and talent of those I’ve had the privilege of working with. We’re looking forward to sharing the release of the future archives and to continuing the efforts to extend the life of the important and innovative work our authors and other digital scholars are creating.