Two weeks ago I attended the 2008 International Conference on Dublin Core and Metadata Applications. Dr. Allen Renear, Karen Wickett and I were there presenting our paper (well, Allen did all the presenting) Collection/Item Metadata Relationships.
A Semantic Web Layer Cake. Modified from the original at http://semtext.org/2004-02/slides/img4.gif (thanks to Karen for pointing this out!)
There was a fair amount of Twitter activity during the conference and during Ed Summer’s talk about “LCSH, SKOS and Linked Data” I started an exchange about URIs and their role in the Semantic Web. Actually the increasing “semaniticness” of the Dublin Core specifications has been on my mind for a while. When I first started encountering it several years ago it was impenetrable to me as someone who’s technical skills were mostly acquired one the job. I’d mastered relational databases and was becoming proficient in XML, but the emergence of the Abstract Model presented more of a challenge. My mind would drift back to the days where I’d be standing in front of thirty or so librarians, archivists and museum professionals at a CDP workshop – how would I explain the Abstract Model to them? And more importantly how would they actually participate in a “semanticaly” enabled CDP?
At one point Ed quotes Andy Powell:
…by treating values as non-literal resources and assigning URIs to
them we give ourselves (and others) the hooks on which to hang further descriptions.
This idea of replacing literals with non-literals in our metadata is certainly attractive, especially in a robust networked environment. What I haven’t yet heard is what happens when the network is brittle and things start breaking. It seems possible that the neat web of relationships that we’ve identified could quickly start unraveling itself. This seems especially true in an environment where metadata gets aggregated away from its original creator. Sure, in your shop you may know that you’ve “minted” URIs for new properties or replaced old URIs with new ones, but the metadata that you’ve released into the wild may not know about these changes. In these discussions about replacing literals with non-literals there always seem to be some assumption that the non-literals will a) be globally unique and b) be persistent. As Andy Powell suggested via Twitter the scenarios where this isn’t true are not a technical failure of the semantic web, but a social/political/commitment failure on people implementing systems. No doubt this is true, but in my book the people problems are always harder to solve than the technical ones.
Take the CDP’s aggregation of Dublin Core metadata as an example. When I was there I’d made a private commitment to keep the percentage of bad URLs below 10%. You might think this was easy, but in fact was quite a lot of work – largely because many of our partners (and their partners) hadn’t bought into the belief that URLs needed to be persistent. Sometimes a simple change on their end that was automatic didn’t make it way to us and required manually updating every record. This problem cascades beyond CDP to the IMLS DCC item-level repository which also now contains records with bad URLs. Even though the DCC repository could potentially revise its records through OAI-PMH, CDP’s OAI data provider disappeared about a year ago when a server was replaced. We now have several layers of social/political/commitment between us and the resource that we are describing or wanting to retrieve.
Several studies have been conducted that show various rates for “linkrot” in URLs, but I have yet to find any references to the expectations/reality of “URI rot.” With millions upon billions of URIs being “minted” (they are the coin of the semantic realm after all), having even a small portion of them fail seems like it could wreak havoc on the neat and tidy graphs that are the basis of the semantic web. This also would seem to be a concern for long-term digital preservation in the case where the services, etc. that your relied on today may have long since disappeared. Recommendations like “coolURIs” help address the technical issues but they don’t seem to address the “people” problem.
And what of the resource-strapped (as in cash, manpower, etc. not as in “things” being described) cultural heritage institutions? Will they really be able to mint robust and long-lived URIs? Or will they be relegated to the backwaters of the un-semantic web? Just as there has been a gap between institutions that are able to get their collections online, we now could have a growing divide between those who are able to provide semantically enhanced metadata. Again, a political/social problem as much as its a technical one.
Perhaps “semantifying” metadata could be a new job for metadata aggregators like IMLS DCC. I could image a service provider adding a processes to their workflow that would append URIs for known controlled vocabulary terms to aggregated records or provide new URIs for things that didn’t have one already. This seems to point towards the top layer of the semantic layer cake – that of trust. Is it necessary to know who has the “authoritative” URI for a resource or property? What are the politics/social issues involved in taking responsibility for URIs for someone else’s “things?” If there are multiple URIs, how do I know that they point towards same “thing?” Should I mint a new URI for one that has failed?
At times I feel like the “Semantic Web” buzz is just swapping in a new technical platform without really addressing the social problems that prevented us from achieving similar goals with older technologies like XML. Jerry McDonough discusses his concerns with regards to XML in his recent Balisage article, “Structural Metadata and the Social Limitation of Interoperability: A Sociotechnical View of XML and Digital Library Standards Development.”:
Like a rope, [XML] is extraordinarily flexible; unfortunately, just as with rope, that flexibility makes it all too easy to hang yourself.
In the case of the semantic web, I may be less worried about hanging myself and more worried that the rope I’m hanging onto might be cut someone up above at any time – sending me and my metadata into the abyss. It also seems that addressing some of these concerns could encourage more uptake of semantic web technologies, especially where social/political/financial commitments are required to make it happen. Looking back to the lessons we’ve learned (or have yet to learn) from our experiences with XML, metadata interoperability, and shareability would make me feel more comfortable relying on the “cloud.”