DigiPres 2022

Last week I attended the DLF and DigiPres conferences in Baltimore. Below is a reformatting of my presentation for the latter, “The Story of a Digital Scholarly Publication, As Told by Its Preservation Format.” The original slides, with full presentation text (but without moving video) are available at https://osf.io/zerxs/.

I want to start at the beginning by presenting these publications as stories. And I want to focus on how each archival version of one publication ends up telling a different version of the original story presented by the publication. So to do that, I’d like to first introduce you to one of SUP’s eleven digital publications.

Screenshot of SUP web page showing cover thumbnails along with title, author, and publication date of each of our current 10 projects.
SUP’s digital publications through 2023

Like all eleven publications that will have come out of this program by the end of the year, Filming Revolution was published using all the same rigorous processes as we, and other university presses, use for publishing monographs.

Screencast of browsing Filming Revolution

But Filming Revolution isn’t a book. It’s a web-based meta-documentary about documentary and independent filmmaking in Egypt since the 2011 revolution, bringing together the collective wisdom and creative strategies of media-makers in Egypt, before during and after the revolution.

The project compiles original interviews with 30 professional and amateur filmmakers and places them within a network of events, themes, people. A reader chooses their own path through text, media, and visual network. While it contains all the heft and rigor of a book, it lives only online as a website composed of a SQL database, code, text, and over 400 video clips. There is no print version of the publication, and it’s impossible to “get a copy” offline without doing some serious work. As a website, albeit one that, like a traditional monograph, contains a cohesive scholarly argument, it resists all the typical means of persisting its argument in the scholarly record.

Thanks to a grant from the Mellon Foundation, SUP has been afforded the opportunity to build a model program that takes a digital project with no print corollary through peer review, editorial development, copyediting, marketing, cataloging, and ISBN, DOI, and copyright registration. We also then host the final product on our own dedicated server and provide access to it free of charge. A key piece of the grant has been to establish a preservation plan that would ensure continued access to the projects even after they become no longer supportable on the live web. So it’s been my mission to ensure that what we’re publishing and putting out into the scholarly landscape is durable and persistent.

To achieve this we’ve written guidelines for authors of digital projects that steer them toward responsible and sustainable choices during the development of their projects. But we’ve also anticipated that what seems sustainable today will be outdated tomorrow and that we can’t foresee the best practices that will emerge among a changing landscape of web standards and technologies. (Or even that there will be a “web” in 100 years.)

Screenshot of slide showing three preservation pathways with logos or icons: Web Archiving, Stanford Digital Repository, and Documentation
Three preservation pathways: web archiving, Stanford Digital Repository, and documentation

So we’ve identified and applied three primary archiving formats to allow readers continued access to the publications: web archiving, digital repository accessioning, and documentation. Our work in this area has become especially critical since the grant funding period is ending at the end of next year, making it even more important that these archives are in place so that they will persist in their archived forms when their live web iterations begin to deprecate.

But when a publication transitions from its live iteration, with all its affordances of reader-directed navigation and multimodal immediacy, to its archived version, how does it change?

What altered story does it tell? And who is privileged in its reincarnation? To explore those questions, let’s follow Filming Revolution through its three archived versions.

First, the web archive. Because each of our publications was to live completely on the web, the assumption when we wrote the first phase of the grant was that Stanford Library would simply web archive the projects and that would be that. One of the first discoveries when I joined the project was that web archiving was not a one-size-fits-all solution.

Side-by-side comparison of the live version of SUP's first project, Enchanting the Desert, and its non-functional initial web archive
Side-by-side comparison of the live version of SUP’s first project, Enchanting the Desert, and its non-functional initial web archive

In fact, Stanford Library’s existing system at the time, though it seemingly crawled a web archive of our first publication, was not able to then replay that archive. We realized we needed to dig deeper into the different systems and tools available for this task and have since discovered that Webrecorder, with its Browsertrix-based crawler and ReplayWeb.page tool was better able to both capture and replay the kinds of javascript-heavy content comprising our publications. In fact, the second round of Mellon funding brought in Webrecorder as a sub-awardee so we could benefit from its continued development.

But what story does a web archive tell? And who is it for?

Screencast of browsing the web archive of Filming Revolution

As a high-fidelity copy of the project, Filming Revolution’s web archive looks, feels, and responds almost identically to the “real” or live web publication. Most of the original project’s target readers are students or researchers of Middle East studies or media and film studies. For this reader base, then, the web archive is indistinguishable from the original experience of navigating, interacting with, and experiencing  the project. All the features, from the reader-directed multilinear navigational pathways, to the embedded video, to the tutorial pop-ups and credits, are retained in this high fidelity archive version. In many ways then the audience the live project aims to engage is the same audience whose needs or interests are met with this archival representation. But the web archive does exclude at least one subsection of the original publication’s potential readership: those who are interested in uncovering the methods and building blocks by which the project is built.

Screenshot of Filming Revolution open in browser with developer tools panel expanded on the right to show HTML and styles
Live project with dev tools open in Chrome

Whereas in the live version, such a reader might open up the developer’s tools to reveal the code structures, a reader who did the same with the web archive would be met simply with an indication that something, the entire project in fact, has been embedded in a very simple HTML page.

Screenshot of Filming Revolution web archive open in browser with developer tools panel expanded on the right to show HTML and styles
Web archive with dev tools open in Chrome

The transformation from complex website, with attending database and scripts, to a WARC, or in this case WACZ file, means that much of the project’s meta-story is shaved away. The story the project contains is still there, but the story that contains the project is critically obscured if not entirely lost.

As an alternative, then, we’ve also committed to depositing all the pieces that make up the live version of a publication in the Stanford Digital Repository. Many of you here are more than familiar with digital repositories, so I won’t spend too much time breaking down the intricacies involved with that process. But I do want to dig a little into the story a repository collection tells when we’re considering it as a format for the preservation of a digital scholarly publication like Filming Revolution. Whereas the high-fidelity web archive may deliver a similar if not indistinguishable experience to that of interacting with live web content, the repository collection seemingly decontextualizes  a project’s intended presentation, privileging instead the parts that make up the whole.

The collection for Filming Revolution, for example, contains, among other things, the complete SQL database (for better or for worse), the code powering the front-end interactions, the custom platform used to author the essays and load the media, and all 400+ video clips contained in the project. It’s all there, but it’s nearly impossible to make any sense out of the disaggregated parts. One limitation of the SDR, for example, is that  a collection doesn’t display what is assigned to or contained in that collection. Instead, we’ve had to create a kind of table of contents that sits outside the repository within the archive section of the live project’s cover matter. This form of archive really exists for the kind of reader the web archive leaves out.

A digital humanist a hundred years from now, for example, interested in rebuilding a long-dead project or exploring the coding conventions and standards of early twenty-first century digital scholars, would find great value in this archival version of the publication. But a less technical reader,  the kind of audience the author had in mind when generating her prose and interface, would likely be overwhelmed by the breadth and format of the content available here and would be completely unaware of the elegantly complex presentation that had originally tied all those pieces together into a cohesive argument.

To bridge the gap between web archive and SDR collection, and also to fill out and bolster the repository collection, we’ve also employed documentation. In the case of Filming Revolution (and the rest of our projects) we’ve built on Dene Grigar and Stuart Moulthrop’s “traversals” model to allow future readers or researchers a glimpse into the experience of the publication in the event that the high-fidelity web archive itself becomes obsolete. What this means is that for all of our publications we ask the authors at the end of the production process to create a screencast wherein they walk an audience through the project.

Screen capture showing navigating to and scrolling through Filming Revolution‘s documentation page.

In addition to the screencast, they also compose a written narrative of the user experience. We extend the documentation to also include sections on the project’s technical specifications, requirements, and structures, its entire cast of contributors, and instructions for deployment (if appropriate). Thus the full documentation not only stands on its own to describe and demonstrate the project, it also serves as a top-level document, or instruction manual for the repository collection, providing would-be re-engineers of the project a sample of what they’re aiming to rebuild and details about how to go about it.

Such an archive version invites perhaps the largest set of potential future readers, accommodating in the case of Filming Revolution both the web archive’s audience of Middle East or media historian, as well as the repository collection’s digital humanist with an interest in tools, code, and formats. But even while it invites this audience, it removes from them the agency they would have experienced interacting with the project themselves. Instead of mapping their own course through the various clips, stories, and connections, they are led down the path the author chooses and given the technical information we at the Press had the capacity and foreknowledge to ask for at the time of publication, thus privileging authorial intent, a perspective that the original publications are trying to challenge.

Filming Revolution and the other projects we’ve published over the past six years represent a shift in publication format that carries with it deeper questions about the longevity of a scholarly work and its contribution to, and place in, the scholarly record. In many ways, the question of how to archive such publications hinges on what we define as the publication. And in seeking to capture these different versions of the stories and arguments we’re publishing we’re actually acknowledging that a story really depends on who wants to read it.

Add a Comment

Your email address will not be published. Required fields are marked *