We’re doing a little reimagining of collections.
With new collections such as the Van Riper Family Correspondence collection coming online, and others on the wing, it has us thinking about how we organize and represent those collections in our digital collections platform. Our underlying repository software – Fedora Commons 3.x – is organized based on the idea of “digital objects“. Items such a photograph, or a digitized books, are both “digital objects” in the context of our digital collections platform. And would you believe, so are collections!
Each collection is its own digital object, with its own metadata, thumbnail, collection art, etc. As such, we have the ability to display information about a collection in a very similar fashion, and with the same focused attention, we do with other objects in our digital collections such as a photograph, or a book. So with some recent changes, we have moved away from directing users straight to the items within a collection, and instead, are directing them first to a page about the collection itself. As a technical aside, it’s actually the collection object itself represented with the same view we display other digital objects in the repository.
It is a minor change that most users probably won’t notice at first. But it reflects the work we’ve been doing on the digital collections, migrating and bringing new collections onboard, and a growing desire to celebrate and contextualize each one. It’s also an insight into the interesting world of content modeling with Fedora Commons where items and collections coexist as digital objects in a complex, harmonious balance of RDF statements that model relationship between objects (e.g. “this item is part of that collection”). This ability to change how we represent collections, speaks to the power of content modeling for digital objects repositores – it takes work, its takes head scratching, but it results in a system where we can share and display what we hope will be of most use and interest to our users!
As alluded to in a previous post, the Library’s Digital Collections platform recently underwent a relatively major change – adopting the “BagIt” specification, developed by the Internet Engineering Task Force (IETF), for ingest and export of our digital objects. Impetus for this change came form meeting with a potential future content producer for the platform, thinking about how to model, ingest, provide access to, and preserve their content over time. With only an ad-hoc ingest process in place, we knew more formalized ingest procedures would be needed down the road, and this was a prefect opportunity to lay the groundwork for those. Addressing ingest workflows required stepping back and identifying all points in the system this would effect. Precisely because the effects were far reaching – spanning from changing how assets are organized for ingest, through storage and management in the repository, all the way to their display on the front-end – adopting a widely used standard such as BagIt has had a very positive and normalizing impact on the Digital Collections platform as a whole.
The BagIt standard is, “a hierarchical file packaging format designed to support disk-based storage and network transfer of arbitrary digital content“. BagIt is, at its simplest, a set of rules, checksums, and naming conventions for grouping and packaging files. It is not a file format like Zip (.zip) archives or Tarballs (.tar), both which can provide varying levels of compression and result in a single file; when you “bagify” a directory, unlike creating .zip or .tar files where you end up with a single file, you are still left with the original directory! What it does provide are manifests and checksums for all files that are part of the “bag”, resulting in a package that can be created, moved, disassembled, reassembled, and checked for file integrity and consistency throughout multiple transformations and storage locations. For content producers that may not have access to all nooks and crannies of the back-end for a repository, it is a good way for them to ensure they can retrieve exactly what they deposit.
Adopting the BagIt packaging standard for our ingest process has allowed us to move away from our original, XSLT-based ingest workflow, a workflow that varied widely from collection to collection, resulting in objects that were fairly similar, but not identical, in structure. This was irritating and hindered access at best, and detrimental to preservation efforts at worst. Putting more time and effort into the creation of well-formed bags, which become our ingest SIPs, and then ingesting all BagIt archives through the same, Python-based ingest process, we are left with objects in the repository that have been made with the same mold, and therefore much easier to manage. It has wrestled complex object structure from XSLT stylesheets, into human readable JSON files that can be created by hand or programmatically for larger collections.
Overview of BagIt object ingest, storage, and export for Digital Collections platform
The BagIt standard is fairly widely used, including the Library of Congress, Chronopolis, and The Stanford Digital Repository, among others, and becoming popularized by other repository platforms such as Archivematica, which uses BagIt as the packaging format for AIPs in the system. Because we use Fedora Commons as our repository storage system, we are using BagIt objects only for ingest and export. Originally, BagIt packages were useful because we could show, outside of the repository itself, that objects exported were “identical” (at least as far as individual file checksums are concerned) to the objects ingested. But throughout the process of incorporating them into our ingest / export workflow, they have proved to be very handy packages for neatly wrapping up objects that may contain, 1, 100, 1,000, or more component pieces.
We’re currently working on the digital collections system in order to improve our footing going forward. During this upgrade, collections and individual digital objects might be unavailable at times. Have no fear, though; our work should be finished within the next day or two. Look to this blog for any updates to this schedule.
Digital Collections have increasingly been at the forefront of the WSU Library System’s priorities, and this year we’re introducing a new face for our collections. Our existing system is great at organizing and serving up images, but not so good at handling other kinds of digital items — texts, videos, audio, or combinations of digital files. So we’ve spent the past year laying the groundwork for a new digital collection system, using Fedora Commons as the digital object repository and a custom mix of PHP and Python to build the front end. If that’s confusing, don’t worry — the results are clean, useable, and highly interesting.