In “My So-Called Second Life” I talked about my decision to NOT focus on Second Life for my dissertation. This of course left the question of just what I am doing for my dissertation unanswered.
The other thing that happened last fall was that I started working on a new component of the IMLS Digital Collections and Content Project. This gave me an opportunity to get back to some of the problems that had brought me to graduate school in the first place – looking at metadata for cultural heritage collections. The Collection/Item Metadata Relationships (CIMR) research group has been working to develop formal specifications of the kinds of relationships that hold between the description of a collection and the items that are members of that collection. (working on getting those various papers into our IR, happy open access day!). What we hope to accomplish by the end of the project is an expression of these relationships using some knowledge representation languages like OWL and RDFs – hopefully making these kinds of relationships available for computer processing.
It has been nice to get back to the familiar problems. They’re the kinds of problems that I’d started working on back in Colorado; that led me here and drew me towards the ontology classes I took through my Masters and PhD coursework. They also follows on some of the early pilot research I did for the IMLS DCC. So I’ve also decided to build my dissertation on this solid foundation of experience and learning, pleasantly surrounded by a supportive group of people working on related issues.
As we’ve worked on CIMR research it has been fairly surprising how little work has been done on characterizing just what “collections” are. It’s one of those terms that we use all the time that usually carries different meanings when used in different contexts. We’ve largely been avoiding it in our work as a problem that’s “too interesting” and beyond the scope of the project. We’ve been focusing our attention on the Dublin Core Collection Application Profile and it’s characterization of collections being something that items are simply “gathered into.”
However DC-CAP is largely based on the pre-existing item-level description format of Dublin Core (I’d say the same about how CDWALite and VRACore handle “collections” as well.) Archivists will point towards Encoded Archival Description, but of course this is premised on the archival community’s understanding of what shape collections take. Experiments here with EAD also demonstrate that it doesn’t decompose nicely when flattened. (I think the expression of whole-part relationships in EAD is weak when aggregating across EAD documents – makes perfect sense when you’re trying to wrangle the paper in at first, but whew headaches later). And there are lots of unstructured representations of museum collections on websites and in catalogues. These are a good start towards describing collections -as collections – but they also leave me unsatisfied that they’ve given us to the tools to fully describe collections in useful ways in online aggregated environments.
Perhaps aligning the ways that we describe collections wasn’t quite so important in the past. But since the digitization of Library, archive, museum (LAM) collections has supposedly broken down the barriers for re-integrating these resources it seems like a good time to revisit the issue. Most of the efforts so far have been around finding common ways of expressing and sharing item-level metadata irrespective of the characteristics of the collection as a whole. Perhaps we’ve lost something important by only providing the item out of its context. Maybe more robust ways of sharing that context can help make those items more useful and meaningful to end users (and as Hur-Li Lee suggests, maybe we need to incorporate some of their needs in the model). Maybe it can also tell us how we can better transform the metadata about those items when it is re-situated in a new context. Maybe providing this context means that a user might be able to navigate among items that already have important relationships with each other. The preservation of context is already a problem that is receiving increased attention in Semantic Web circles, perhaps a better understanding of our collections can help us take advantage of these new platforms.
I’ve decided that the best way to address some of these questions is by adopting some of the approaches we’re using in CIMR. I will be proposing to develop an collections ontology drawn from across the cultural heritage community. The rest of the fall will be a process of specifying exactly what this means in practical terms (well, practical in an academic sense) – the scope of the research, datasets, methodologies, etc.
Wish me luck as I’m off and running. (yeah, more running less crawling…)