Inherent Vice
inherent vice: n. ~ The tendency of material to deteriorate due to the essential instability of the components or interaction among components.
SAA Glossary of Archival and Records Terminology

Archive for October, 2008

What is a “cultural heritage” collection, really?

Tuesday, October 21st, 2008

You see it all the time. It’s peppered throughout the RLG Silos of the LAMs report…all over the IMLS materials…on other digitization project websites….and I’m certainly using it all the time. But rarely do you see a definition to go along with it. In fact I’ve tried a few times in the past to try to pin down just what “cultural heritage” covers. It always seemed a little dangerous to use it without having a good definition to point to. Apparently I’m not the only person who’s been unsuccessful at trying to nail the cultural heritage jelly to the wall.

To me “cultural heritage” often feels like a euphemism for something else that we really want to say – like “LAMs”. “Libraries, archives and museums” gets boringly repetitive after you’ve used it a few times in your report or grant application. Besides,  in this age of supposed “convergence” a better collective noun to refer to these things might be “cultural heritage institutions.”

UNESCO Cultural Heritage

Whenever faced with a challenge like this, I often find it helpful to run to government bodies because they often are obligated to be specific about what they are talking about.  This is particularly true whenever international treaties and laws are involved – and even more so when money is involved. So one way to view a definition of “cultural heritage” is through its legal definitions. UNESCO has a description of cultural heritage on its website, although the page I’m linking to seems out of date. This page reminds us that “cultural heritage” is not easily defined because it’s been a moving target with an evolving set of definitions over time. The Getty Research Institute has a nice list of cultural heritage policy documents, many of which describe the legal definitions of “cultural heritage” at different points in history.

A more recent version (at least according to the timestamp in the footer) of the UNESCO Culture pages introduces a more recent trend towards dividing “cultural heritage’ into a few categories:

  • Intangible Heritage includes folk customs, folklore and orally transmitted traditions that may not have a physical instantiation.
  • Movable Heritage and museums includes the kinds of things we normally think about as part of “collections” – archaeological artifacts, paintings, architectural elements,decorative arts, etc., etc.
  • World Heritage appears to be things that are not “movable” such as monumental architecture and sites which are bound to their location.

“Cultural Heritage” in the U.S.

Closer to home, I first stopped in at IMLS to see whether they have any definitions. True to their mandate they don’t define cultural heritage institutions, but instead talk in terms of libraries and museums as “stewards of cultural heritage.” The Museum Services Act goes on to specify cultural, historical, natural and scientific heritage is what museums are responsible for. (the libraries have focused on more abstract definitions of services and the outcomes that they hope will come from funding them).

Rachel Frick at IMLS pointed me towards the definition used by North Carolina Exploring Cultural Heritage Online (NC-ECHO):

Any cultural institution (library, archive, museum, historic site, or organization), which maintains a permanent, non-living collection of unique materials held for research and/or exhibit purposes and open for the use of the public will be surveyed. Denominational/associational collections will be surveyed, but individual church collections will not. Art museums will be surveyed but galleries will not. Zoos, arboreta, and parks will not be surveyed, unless as a part of their mission, they hold collections described above.

The Canadians are also generally very good at this, especial in the heritage sector.  From the Canadian Heritage Information Network, I ended up at the legislation empowering the Department of Canadian Culture. This unfortunately wasn’t much help, since its covers a very broad swath of “cultural” things – from battlefields to performing arts to libraries, archives and museums.

“Cultural Heritage” and Ontology Development

So what? The definition of “cultural heritage” is squishy. Squishy is good right? That may be true, but at this stage in preparing my dissertation proposal I need to nail some of these things down. I think this is particularly important if my overall goal is to develop a domain ontology for “cultural heritage” collections.  I’ll be asking for trouble if I do so without first clearly identifying what my “domain” is.  Identifying the domain will also inform me about what kinds of data sources I will use for developing a new framework.

The CIDOC Conceptual Reference Model (CRM) also starts out by defining its intended scope by stating:

The term cultural heritage collections is intended to cover all types of material collected and displayed by museums and related institutions, as defined by ICOM. This includes collections, sites and monuments relating to natural history, ethnography, archaeology, historic monuments, as well as collections of fine and applied arts. The exchange of relevant information with libraries and archives, and the harmonisation of the CIDOC CRM with their models, fall within the CIDOC CRM’s intended scope.

This is of course a good start if you are just talking about museums – what I hope to accomplish here is more of what appears in the last lines of this statement – harmonization among different institutions who are “stewards of cultural heritage.” This seems like an approach that might better achieve “convergence” instead of trying to simply map between the library domain, the archives domain, and the museum domain.

In the end, the lack of any explicit common definitions makes me believe that I’ll need to specify one of my own that provides clear boundaries of what is in and what is out of scope for this project.

Developing a Cultural Heritage Collections Ontology

Tuesday, October 14th, 2008

In “My So-Called Second Life” I talked about my decision to NOT focus on Second Life for my dissertation. This of course left the question of just what I am doing for my dissertation unanswered.

The other thing that happened last fall was that I started working on a new component of the IMLS Digital Collections and Content Project. This gave me an opportunity to get back to some of the problems that had brought me to graduate school in the first place – looking at metadata for cultural heritage collections. The Collection/Item Metadata Relationships (CIMR) research group has been working to develop formal specifications of the kinds of relationships that hold between the description of a collection and the items that are members of that collection. (working on getting those various papers into our IR, happy open access day!). What we hope to accomplish by the end of the project is an expression of these relationships using some knowledge representation languages like OWL and RDFs – hopefully making these kinds of relationships available for computer processing.

It has been nice to get back to the familiar problems. They’re the kinds of problems that I’d started working on back in Colorado; that led me here and drew me towards the ontology classes I took through my Masters and PhD coursework. They also follows on some of the early pilot research I did for the IMLS DCC. So I’ve also decided to build my dissertation on this solid foundation of experience and learning, pleasantly surrounded by a supportive group of people working on related issues.

As we’ve worked on CIMR research it has been fairly surprising how little work has been done on characterizing just what “collections” are. It’s one of those terms that we use all the time that usually carries different meanings when used in different contexts. We’ve largely been avoiding it in our work as a problem that’s “too interesting” and beyond the scope of the project. We’ve been focusing our attention on the Dublin Core Collection Application Profile and it’s characterization of collections being something that items are simply “gathered into.”

However DC-CAP is largely based on the pre-existing item-level description format of Dublin Core (I’d say the same about how CDWALite and VRACore handle “collections” as well.) Archivists will point towards Encoded Archival Description, but of course this is premised on the archival community’s understanding of what shape collections take. Experiments here with EAD also demonstrate that it doesn’t decompose nicely when flattened. (I think the expression of whole-part relationships in EAD is weak when aggregating across EAD documents – makes perfect sense when you’re trying to wrangle the paper in at first, but whew headaches later). And there are lots of unstructured representations of museum collections on websites and in catalogues. These are a good start towards describing collections -as collections – but they also leave me unsatisfied that they’ve given us to the tools to fully describe collections in useful ways in online aggregated environments.

Perhaps aligning the ways that we describe collections wasn’t quite so important in the past. But since the digitization of Library, archive, museum (LAM) collections has supposedly broken down the barriers for re-integrating these resources it seems like a good time to revisit the issue. Most of the efforts so far have been around finding common ways of expressing and sharing item-level metadata irrespective of the characteristics of the collection as a whole. Perhaps we’ve lost something important by only providing the item out of its context. Maybe more robust ways of sharing that context can help make those items more useful and meaningful to end users (and as Hur-Li Lee suggests, maybe we need to incorporate some of their needs in the model). Maybe it can also tell us how we can better transform the metadata about those items when it is re-situated in a new context. Maybe providing this context means that a user might be able to navigate among items that already have important relationships with each other. The preservation of context is already a problem that is receiving increased attention in Semantic Web circles, perhaps a better understanding of our collections can help us take advantage of these new platforms.

I’ve decided that the best way to address some of these questions is by adopting some of the approaches we’re using in CIMR. I will be proposing to develop an collections ontology drawn from across the cultural heritage community. The rest of the fall will be a process of specifying exactly what this means in practical terms (well, practical in an academic sense) – the scope of the research, datasets, methodologies, etc.

Wish me luck as I’m off and running. (yeah, more running less crawling…)

The URI Gap

Tuesday, October 7th, 2008

Two weeks ago I attended the 2008 International Conference on Dublin Core and Metadata Applications. Dr. Allen Renear, Karen Wickett and I were there presenting our paper (well, Allen did all the presenting) Collection/Item Metadata Relationships.



A Semantic Web Layer Cake. Modified from the original at http://semtext.org/2004-02/slides/img4.gif (thanks to Karen for pointing this out!)

There was a fair amount of Twitter activity during the conference and during Ed Summer’s talk about “LCSH, SKOS and Linked Data” I started an exchange about URIs and their role in the Semantic Web. Actually the increasing “semaniticness” of the Dublin Core specifications has been on my mind for a while. When I first started encountering it several years ago it was impenetrable to me as someone who’s technical skills were mostly acquired one the job. I’d mastered relational databases and was becoming proficient in XML, but the emergence of the Abstract Model presented more of a challenge. My mind would drift back to the days where I’d be standing in front of thirty or so librarians, archivists and museum professionals at a CDP workshop – how would I explain the Abstract Model to them? And more importantly how would they actually participate in a “semanticaly” enabled CDP?

At one point Ed quotes Andy Powell:

…by treating values as non-literal resources and assigning URIs to
them we give ourselves (and others) the hooks on which to hang further descriptions.

(I’m not going to rehash existing discussions about 1) what is a URI and 2) what are literals and non-literals. Also see Pete Johnston’s “Dublin Core Key Concepts” tutorial slides).

This idea of replacing literals with non-literals in our metadata is certainly attractive, especially in a robust networked environment. What I haven’t yet heard is what happens when the network is brittle and things start breaking. It seems possible that the neat web of relationships that we’ve identified could quickly start unraveling itself. This seems especially true in an environment where metadata gets aggregated away from its original creator. Sure, in your shop you may know that you’ve “minted” URIs for new properties or replaced old URIs with new ones, but the metadata that you’ve released into the wild may not know about these changes. In these discussions about replacing literals with non-literals there always seem to be some assumption that the non-literals will a) be globally unique and b) be persistent. As Andy Powell suggested via Twitter the scenarios where this isn’t true are not a technical failure of the semantic web, but a social/political/commitment failure on people implementing systems. No doubt this is true, but in my book the people problems are always harder to solve than the technical ones.

Take the CDP’s aggregation of Dublin Core metadata as an example. When I was there I’d made a private commitment to keep the percentage of bad URLs below 10%. You might think this was easy, but in fact was quite a lot of work – largely because many of our partners (and their partners) hadn’t bought into the belief that URLs needed to be persistent. Sometimes a simple change on their end that was automatic didn’t make it way to us and required manually updating every record. This problem cascades beyond CDP to the IMLS DCC item-level repository which also now contains records with bad URLs. Even though the DCC repository could potentially revise its records through OAI-PMH, CDP’s OAI data provider disappeared about a year ago when a server was replaced. We now have several layers of social/political/commitment between us and the resource that we are describing or wanting to retrieve.

Several studies have been conducted that show various rates for “linkrot” in URLs, but I have yet to find any references to the expectations/reality of “URI rot.” With millions upon billions of URIs being “minted” (they are the coin of the semantic realm after all), having even a small portion of them fail seems like it could wreak havoc on the neat and tidy graphs that are the basis of the semantic web. This also would seem to be a concern for long-term digital preservation in the case where the services, etc. that your relied on today may have long since disappeared. Recommendations like “coolURIs” help address the technical issues but they don’t seem to address the “people” problem.

And what of the resource-strapped (as in cash, manpower, etc. not as in “things” being described) cultural heritage institutions? Will they really be able to mint robust and long-lived URIs? Or will they be relegated to the backwaters of the un-semantic web? Just as there has been a gap between institutions that are able to get their collections online, we now could have a growing divide between those who are able to provide semantically enhanced metadata. Again, a political/social problem as much as its a technical one.

Perhaps “semantifying” metadata could be a new job for metadata aggregators like IMLS DCC. I could image a service provider adding a processes to their workflow that would append URIs for known controlled vocabulary terms to aggregated records or provide new URIs for things that didn’t have one already. This seems to point towards the top layer of the semantic layer cake – that of trust. Is it necessary to know who has the “authoritative” URI for a resource or property? What are the politics/social issues involved in taking responsibility for URIs for someone else’s “things?” If there are multiple URIs, how do I know that they point towards same “thing?” Should I mint a new URI for one that has failed?

At times I feel like the “Semantic Web” buzz is just swapping in a new technical platform without really addressing the social problems that prevented us from achieving similar goals with older technologies like XML. Jerry McDonough discusses his concerns with regards to XML in his recent Balisage article, “Structural Metadata and the Social Limitation of Interoperability: A Sociotechnical View of XML and Digital Library Standards Development.”:

Like a rope, [XML] is extraordinarily flexible; unfortunately, just as with rope, that flexibility makes it all too easy to hang yourself.

In the case of the semantic web, I may be less worried about hanging myself and more worried that the rope I’m hanging onto might be cut someone up above at any time – sending me and my metadata into the abyss. It also seems that addressing some of these concerns could encourage more uptake of semantic web technologies, especially where social/political/financial commitments are required to make it happen. Looking back to the lessons we’ve learned (or have yet to learn) from our experiences with XML, metadata interoperability, and shareability would make me feel more comfortable relying on the “cloud.”

  • <div> of Shameless Commerce