Inherent Vice
inherent vice: n. ~ The tendency of material to deteriorate due to the essential instability of the components or interaction among components.
SAA Glossary of Archival and Records Terminology

Archive for the 'semanitc web' Category

Putting IMLS DCC on the Map

Monday, April 27th, 2009

Another cross-posting from the IMLS DCC Project Blog

I recently attended the Museums & the Web 2009 conference in Indianapolis, IN. Prof. Mike Twidale and I were there to do a live patchwork prototyping demo of the IMLS DCC Collection Dashboard concept. We had a great crowd of attendees in our booth who provided us with lots of great ideas for next steps (more on that, and a similar demo we did at HASTAC III later). But I also participated in several “unconference” conversations about the semantic web and open/linked data.

At the moment, information from the IMLS DCC is only available via the website and via our OAI-PMH data providers (one for collection-level records, and another for item-level records). While these are great for sharing records between repositories, they don’t necessarily make the information that we have accessible to cool web services like Yahoo! Pipes. Mia Ridge, at the Science Museum in London (and keeper of the Museum API wiki) issued a challenge for us to DO ONE THING before April was over. So here’s my attempt at DOING ONE THING with IMLS DCC. (and is admittedly just a baby step).

One of the services I learned about at MW2009 is Dapper, a tool that will screenscrape HTML pages to produce various kinds of output that you can share with APIs (application program interfaces). Dapper fits nicely within our Patchwork Prototyping toolbox, as it lets us play with some IMLS DCC data in ways that we couldn’t before and without having to actually build an IMLS DCC API first. One of the desirables that came up in both our MW2009 and HASTAC demonstrations was being able to see IMLS DCC collections on a map. So here we go…

First I screenscraped the list of IMLS DCC Collections By Title page. Dapper then allowed me to create:

I took the Atom feed and passed it to the location extractor in Yahoo! Pipes to generate a map.

IMLS DCC Collection Map

This is just a first baby step towards building other widgets for a collections dashboard! It needs some work (only a certain number of collections will appear on the map at any one time – you need to browse through the list to see more collections), but the idea behind the DO ONE THING challenge was to take some simple steps to build momentum.

A special thanks to colleague Piotr Adamczyck and his MuseumPipes blog for inspiration!

Modelling CDWA Lite as an OWL-DL Ontology

Thursday, March 26th, 2009

Ooops….after the iSchools 2009 conference, I updated a page on my website that contained my poster “Modelling CDWA Lite as an OWL-DL Ontology” but never posted anything here at Inherent Vice.  You can also download the full poster from the IDEALS repository.

I’ve also just posted the beta version of the OWL file on my website as well. I do this with some trepidation, since this is probably the first full OWL model that I’ve created from top to bottom. As I note in the paper, the current structure of the CDWA Lite XML schema forces ontology developers to make some choices about how certain parts of the schema are modelled in an ontology.

This was a useful learning exercise, but I’m not sure if I will take this particular OWL model forward. I had intentionally avoided using the CIDOC-CRM and the improvements suggested by the MuseumDAT project. CDWA and CDWA Lite have enough of a toehold here in the United States and had impacted other influential standards such as the VRACore and Cataloging Cultural Objects. I felt that it deserved a fair shake to stand on its own. But some of the problems I encountered in trying to create an OWL model suggest that modeling CDWA using CRM would be a worthwhile next step.

If you’re working on a similar project I would be interested in hearing from you and would appreciate any comments or feedback on the ontology itself.

The URI Gap

Tuesday, October 7th, 2008

Two weeks ago I attended the 2008 International Conference on Dublin Core and Metadata Applications. Dr. Allen Renear, Karen Wickett and I were there presenting our paper (well, Allen did all the presenting) Collection/Item Metadata Relationships.



A Semantic Web Layer Cake. Modified from the original at http://semtext.org/2004-02/slides/img4.gif (thanks to Karen for pointing this out!)

There was a fair amount of Twitter activity during the conference and during Ed Summer’s talk about “LCSH, SKOS and Linked Data” I started an exchange about URIs and their role in the Semantic Web. Actually the increasing “semaniticness” of the Dublin Core specifications has been on my mind for a while. When I first started encountering it several years ago it was impenetrable to me as someone who’s technical skills were mostly acquired one the job. I’d mastered relational databases and was becoming proficient in XML, but the emergence of the Abstract Model presented more of a challenge. My mind would drift back to the days where I’d be standing in front of thirty or so librarians, archivists and museum professionals at a CDP workshop – how would I explain the Abstract Model to them? And more importantly how would they actually participate in a “semanticaly” enabled CDP?

At one point Ed quotes Andy Powell:

…by treating values as non-literal resources and assigning URIs to
them we give ourselves (and others) the hooks on which to hang further descriptions.

(I’m not going to rehash existing discussions about 1) what is a URI and 2) what are literals and non-literals. Also see Pete Johnston’s “Dublin Core Key Concepts” tutorial slides).

This idea of replacing literals with non-literals in our metadata is certainly attractive, especially in a robust networked environment. What I haven’t yet heard is what happens when the network is brittle and things start breaking. It seems possible that the neat web of relationships that we’ve identified could quickly start unraveling itself. This seems especially true in an environment where metadata gets aggregated away from its original creator. Sure, in your shop you may know that you’ve “minted” URIs for new properties or replaced old URIs with new ones, but the metadata that you’ve released into the wild may not know about these changes. In these discussions about replacing literals with non-literals there always seem to be some assumption that the non-literals will a) be globally unique and b) be persistent. As Andy Powell suggested via Twitter the scenarios where this isn’t true are not a technical failure of the semantic web, but a social/political/commitment failure on people implementing systems. No doubt this is true, but in my book the people problems are always harder to solve than the technical ones.

Take the CDP’s aggregation of Dublin Core metadata as an example. When I was there I’d made a private commitment to keep the percentage of bad URLs below 10%. You might think this was easy, but in fact was quite a lot of work – largely because many of our partners (and their partners) hadn’t bought into the belief that URLs needed to be persistent. Sometimes a simple change on their end that was automatic didn’t make it way to us and required manually updating every record. This problem cascades beyond CDP to the IMLS DCC item-level repository which also now contains records with bad URLs. Even though the DCC repository could potentially revise its records through OAI-PMH, CDP’s OAI data provider disappeared about a year ago when a server was replaced. We now have several layers of social/political/commitment between us and the resource that we are describing or wanting to retrieve.

Several studies have been conducted that show various rates for “linkrot” in URLs, but I have yet to find any references to the expectations/reality of “URI rot.” With millions upon billions of URIs being “minted” (they are the coin of the semantic realm after all), having even a small portion of them fail seems like it could wreak havoc on the neat and tidy graphs that are the basis of the semantic web. This also would seem to be a concern for long-term digital preservation in the case where the services, etc. that your relied on today may have long since disappeared. Recommendations like “coolURIs” help address the technical issues but they don’t seem to address the “people” problem.

And what of the resource-strapped (as in cash, manpower, etc. not as in “things” being described) cultural heritage institutions? Will they really be able to mint robust and long-lived URIs? Or will they be relegated to the backwaters of the un-semantic web? Just as there has been a gap between institutions that are able to get their collections online, we now could have a growing divide between those who are able to provide semantically enhanced metadata. Again, a political/social problem as much as its a technical one.

Perhaps “semantifying” metadata could be a new job for metadata aggregators like IMLS DCC. I could image a service provider adding a processes to their workflow that would append URIs for known controlled vocabulary terms to aggregated records or provide new URIs for things that didn’t have one already. This seems to point towards the top layer of the semantic layer cake – that of trust. Is it necessary to know who has the “authoritative” URI for a resource or property? What are the politics/social issues involved in taking responsibility for URIs for someone else’s “things?” If there are multiple URIs, how do I know that they point towards same “thing?” Should I mint a new URI for one that has failed?

At times I feel like the “Semantic Web” buzz is just swapping in a new technical platform without really addressing the social problems that prevented us from achieving similar goals with older technologies like XML. Jerry McDonough discusses his concerns with regards to XML in his recent Balisage article, “Structural Metadata and the Social Limitation of Interoperability: A Sociotechnical View of XML and Digital Library Standards Development.”:

Like a rope, [XML] is extraordinarily flexible; unfortunately, just as with rope, that flexibility makes it all too easy to hang yourself.

In the case of the semantic web, I may be less worried about hanging myself and more worried that the rope I’m hanging onto might be cut someone up above at any time – sending me and my metadata into the abyss. It also seems that addressing some of these concerns could encourage more uptake of semantic web technologies, especially where social/political/financial commitments are required to make it happen. Looking back to the lessons we’ve learned (or have yet to learn) from our experiences with XML, metadata interoperability, and shareability would make me feel more comfortable relying on the “cloud.”

  • <div> of Shameless Commerce