Considering Archivability

Laptop with code editor, CC0 license from Negative Space

Developing a scholarly digital project is a complex process. In addition to the research and writing typical of any scholarly project, authors must choose a platform or a framework that suits the needs of both the argument and content. Depending on their digital literacies, they may need to learn these systems and recruit collaborators with the skills to fill in their knowledge gaps or to handle some of the more technical aspects of the format. Once the project’s development is underway, they need to coordinate the work and ensure the team is implementing strategies for the project’s longevity even as they create innovative and sophisticated scripts and styles that will make the project clean yet innovative and robust. It can be difficult for everyone involved to always be on the same page, especially when work is divided among a large team of people each contributing their specialized knowledge. But it’s crucial that everyone contributing to a project has a clear idea of the goals, and one of those goals must be the longevity of the final product.

To help our authors continue to develop their work after they’ve agreed to publish with us, we recently made publicly available a thorough set of guidelines that cover a range of recommendations for the technical aspects of their projects. So to complement and alternate with our FAQ series, I’m launching a technical guidelines series to highlight each of the topics covered in our guidelines package. While these guidelines were written specifically for SUP’s digital publishing program, the ideas are applicable to any web-based project, and I hope they might be helpful for individual scholars pursuing digital modes of sharing their work as well as fellow publishers implementing digital scholarly works into their programs.

But in reality, none of the projects that fit our program are immediately or 100% webarchivable.

The first section in our technical guidelines package serves as a perfect starting place for the series as it encompasses many of the philosophical concepts that drive the other sections. It’s rather generally titled “Archivability,” and it essentially describes a spectrum onto which all web-based content falls. Rather than providing strict directions to authors and developers about what they have to do, this document explains what makes a project archivable or not and what kinds of precautions and supplementary material authors should be considering in the face of the persistent challenges of preserving web content.

Screenshot of the Archivability guidelines’ first page

The document basically divides web-based projects into three categories, which were developed as a result of a meeting our team had with members of Stanford University Library’s Digital Library Systems and Services division (DLSS). These categories—archive-ready, archive-amenable, and archive-resistant—are expansions on categories suggested at that meeting for determining how amenable projects might be to web-archiving. But in reality, none of the projects that fit our program are immediately or 100% webarchivable. Web archiving, as it’s defined and practiced by entities like the Internet Archive or even WebRecorder, is still only effective when a site doesn’t employ complex JavaScript. However, we believe authors can incorporate current sustainable development practices that will make their project’s preservation or, if necessary, its future resuscitation easier. What we’re aiming for with our archivabilty recommendations is something broader than just a web archive like the kind you can find on the Wayback Machine. We’re hoping to recommend development strategies that will extend a project’s natural life online and also to ensure we have enough information and material to someday restore more complex projects whose components are less sustainable right now.

For all the complexity of today’s digital scholarship, we’re all way behind when it comes to preservation.

I won’t repeat the document’s full contents here, but I do hope interested readers will call it up and see for themselves what we’ve put together. Though potentially frustrating for some because of its lack of explicit directives, the Archivability document is pretty illustrative of the impossibility of nailing down a perfect failsafe approach to creating stable and immortal web publications. The truth is no one has enough control of the myriad building blocks of web environments to be able to create a web project that will live forever. Ironically, though, it is the first website ever built that has persisted the longest. Why? Because it uses only HTML and a basic file-and-folder system. It isn’t interactive beyond the hyperlinks that enable navigation between its pages, and it certainly doesn’t harness the power of JavaScript, databases, or APIs—all common if not necessary features of the kind of complex projects scholars are building today. For all the complexity of today’s digital scholarship, we’re all way behind when it comes to preservation. And while these guidelines don’t solve the problem of that imbalance, we hope they offer digital scholars some ideas worth considering in early stages of project development and that they serve as a call to action for those in the field of digital preservation.

Add a Comment

Your email address will not be published. Required fields are marked *