Jon Phipps recently posted a Manifesto For Managing Metadata in an Open World that contains some interesting concepts that I don’t think are fully appreciated by the metadata community. (I kind of wish these points were numbered, so we could track them a little better.
Globally PUBLISHED Metadata SHOULD be TRUE, based on the domain knowledge of the publisher.
For many of us coming from a humanities background (especially those of use brought up in the midst of post-modern relativism) the idea of TRUTH can be a tough one to get our heads around. In the context of RDF’s semantic model, the TRUTH of a triple has a very limited, formal meaning. Recall that the original purpose of knowledge representation languages such as RDF was to allow inferencing across a set of statements. If you are using RDF in this way, its semantic models make EVERY EXPRESSED TRIPLE a TRUE statement. Because of the limitations of artificial reasoning the designers of RDF had to constrain the features of the language. It is not natively possible to make a FALSE statement in RDF, or to negate a statement (i.e. say something is NOT TRUE).
This is not some abstract sense of TRUTH, but a logical, mathematical kind of TRUTH. An RDF statement is TRUE within the context of some interpretation that expresses what kinds of statements are allowed. This is the purpose of an RDF Schema or more complex OWL ontology. The extent that your model conforms to some communal notion of “truth” in the real world is outside both the metadata and your ontology. It’s a matter for discussions among community members like the ones we’re having now at DCMI. This also gets to Jon’s next point:
PUBLISHERS of metadata MUST publish the semantics of the metadata, or reference publicly available semantics.
Without an expression of your model – what counts as TRUE statements for your metadata – an aggregator is at a loss. I can look at your metadata and try to supply an interpretations, but only by talking to you will I understand whether I got it right. In the past we’ve relied on the idea that people should adopt standards, and by adopting standards up-front you are committing to a particular model. But one of the things that we see is that global standards often don’t fit local needs. Too often we make practical choices about how to adopt those standards to our local collections. Unfortunately, this also means your model is obscured and unavailable to an aggregator. If you SAY you’ve conformed to a standard or a model and you then deviate from the TRUTH of that model, your metadata will look incoherent to non-local consumers of that metadata (see Jon’s other manifesto points with regard to coherence. I understand coherence to mean LOGICALLY coherent, but they may have a lower expectation given some of the vagaries of legacy data syntaxes).
As an aggregator, it should be my job not to preserve the SYNTAX of your metadata, but to preserve the TRUTH of what your metadata says – even if that means saying it another way that better conforms to MY models. Ideally how I translate statements optimized for your model into my model should also be publicly shared by aggregation services (and wouldn’t it be nice if were done in a way that other aggregator could pick up and use?).
Jon’s post doesn’t say much about the Open World model alluded to in the title. But you may already have your back-up about this notion of the truthfulness of metadata statements. Part of the open world assumption is that we do not have to adopt a single definition of true statements, but rather that we can express multiple models each of which is internally true and coherent. Different models may overlap or be contradictory (i.e. statements that are coherent in one model may be incoherent in another). Because our debates have become so focused on syntaxes and conformance to a single model (THE STANDARD), I don’t know how we move to a metadata ecology that allows this kind of diversity. (The CIDOC CRM acknowledges that it seeks to find some common ground among knowledge bases that may contain contradictory statements, since the truth of a particular statement is often the subject of academic and scholarly debate.)
CONSUMERS of published metadata SHOULD assume that the global metadata is locally INVALID, INCOHERENT, and UNTRUE.
I’m certainly onboard with this statement, which is backed up by much of the research conducted on IMLS DCC metadata aggregations. However, much of this evaluation has used metadata where I’m not sure that anyone was thinking about the logical truth of what they were doing. Often, these kinds of analysis are like shooting fish in a barrel (not to say that they analysis themselves are easy…right Wickett?). We also know that XML, while great for representing document structures, is not necessarily the best at representing statements that lend themselves to evaluation about their truth.
- Wickett, K. and Renear, A.H. (2012) The logical form of the proposition expressed by a metadata record. JCDL 2012. http://dl.acm.org/citation.cfm?doid=2232817.2232917
- Dubin, D., Futrelle, J., Plutchak, J., & Eke, J. (2009). Preserving Meaning, Not Just Objects: Semantics and Digital Preservation. Library Trends, 57(3), 595–609. http://muse.jhu.edu/journals/library_trends/v057/57.3.dubin.html
-
Dubin, D., & Birnbaum, D. J. (2004). Interpretation Beyond Markup.
Proceedings of Extreme Markup Languages 2004, Montreal, Quebec, August 2004. IDEAlliance and Mulberry Technologies, Inc. Retrieved from
http://hdl.handle.net/2142/11838
Renear, A. H., Dubin, D., Sperberg-McQueen, C. M., & Huitfeldt, C. (2003). XML semantics and digital libraries.
Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries. Presented at the JCDL’03, Washington, D.C.: IEEE. Retrieved from:
http://dl.acm.org/citation.cfm?id=827192
In the context of DCMI standards, the Dublin Core Application Profile is an important mechanism with which metadata publishers could use to express their models. Even though DCAP is less formal than, say, an OWL ontology, it allows us to express our intentions for published metadata. I think DCMI still needs to work on changing the perception of DCAP as something that requires a lot of effort on the part of a community towards understanding DCAP as a way to document “as built” features of a repository that can aid in the aggregation of metadata. This is especially true for the normal use case where a single repository may have collection-by-collection profiles.
I’d also like to see some more conversations at these higher levels regarding bibliographic representations. Do you believe that a MARC tag/value has some kind of truth to it? What would the model of truth look like? (how would I evaluate the truthfulness of a MARC record?) How do these notions relate to bibliographic principles, such as the Principle of Accuracy? This seems especially important in the push to represent RDA as RDF concepts.