In the year 2017, over three decades since the personal computer became virtually commonplace in homes, it’s probably safe to assume many of us have come across the stray CD ROM or USB thumb drive or maybe even floppy disk and eagerly, or perhaps with trepidation, inserted it into a working computer or external drive to re-discover its contents. Depending how old the storage device is, it might contain anything from docx files to swf to wpd files. And it’s also likely that some of those files have names that shed very little light on their contents. (Just which “essay” exactly would I have last edited on January 29, 1999?) Chances are I could pretty easily open up a Word file to find out, but unless I have a Flash player installed or access to an old computer with Word Perfect on it, the other two files are as good as blank.
File types wax and wane in popularity, and proprietary file types especially are sure to be replaced by newer versions (docx replaces doc) or the next hot proprietor’s format as markets spur tech companies to reinvent their wheels if they want to keep making money. Thanks to web sharing and ever-growing open-access and standard-formats initiatives, many content authors have been using formats that are compatible with many programs and readable by a web browser for quite a while.
But authors of web content still need to consider what file formats are most sustainable and most appropriate for their purposes. And beyond that, they need to consider what they’re naming their files in the occasion that they ever become unreadable by standard machines. The whole file name matters. What’s before the dot should indicate the contents in a logical way, and what’s after the dot should be a format that is interoperable and preservable. That’s why we’ve drafted a set of recommendations for File Names and Formats, and these guidelines serve as the topic of this week’s post in our continuing Technical Guidelines series. While html, css, and js are the building blocks of web content, any modern web project will also include media files for images, sound, video, and more. We’ve done our best to anticipate the most preservable and most interoperable formats so our authors’ work will stay accessible in one form or another whether it’s the live hosted version or archived components.
The whole file name matters.
Many people’s preferences for file format depend on what software they have available to display, play, or edit that file’s contents. For instance, a Mac user working with iMovie will likely generate an mp4 file for a video, whereas a Windows user will by default save their video creation as wlmp. Of course, most experienced content creators in any medium are familiar with exporting their work into certain formats, and we hope to point them to the ones that will last the longest or, at the very least, will be best supported in our digital repository.
As we point out in the guidelines, which you can read in full here, oftentimes file-type decisions are dependent upon the desired quality of the media. Images that need to be deeply zoomed, for example, will require a format that supports granular detail, whereas an image that is representative of an existing photograph can only be as high-quality or detailed as its original format allowed. Whether an author provides a tiff or a jpg may or may not be something that requires much consideration beyond the file’s available quality. But in other cases, like video, the format determines whether or not it can currently be used in an HTML5 video player or, later, whether an archived version of the video will be supported by our own digital repository’s web player. In cases like these, we’ve outlined the preferred and accepted file types that fit Stanford Digital Repository’s preferences. Other institutions, like the Library of Congress, have their own recommendations. As usual, it’s important to plan for the longterm when you create a digital project and spend some time researching what has been and what will continue to be supported as technologies evolve and change.
Jasmine Mulliken is Production and Preservation Manager, Digital Projects, at Stanford University Press. She coordinates the production and preservation workflow of born-digital projects, including recommending platforms and coding standards to authors, consulting with authors on projects’ technical attributes, and evaluating best practices for archiving and preservation.