06/17/13

Introducing LODLAM Patterns

Crossposted at LODLAM Summit 2013:  Introducing LODLAM Patterns

Linked Data provides us with an incredible opportunity to re-think how we approach sharing information about LAM collections.  However, these opportunities are also fraught with danger and important challenges that we must face.  Translating existing standards into compliant Linked Data will take more than just cross-walking terms with similar meanings, it also means mapping between conceptual models and ontologies.   Linked Data also provides us new opportunities to mix models and vocabularies in ways that we haven’t been able to do before.  How can we take better advantage of these opportunities?

Ultimately, creating Linked Data standards and practices is a set of design problems that we are all engaged in.   Elizabeth Churchill has called for “Data Aware Design” and the need to bring human-computer interaction methods to bear on these problems.  At the Summit I will be presenting a Dork Short about a new site that I’m launching to do just this.   LODLAM Patterns will identify Linked Data design patterns (which I’m calling representation patterns) for cultural heritage resources.   The idea is to identify common problems that we are trying to solve and link them to the solutions that are available across the many, many standards for describing LAM resources.  My goal is to create a resource that will spur discussions focused on problems/solutions,  provide newcomers a way to navigate the LOD standards universe, and a pedagogical tool to teach “design-thinking” for Linked Data.

Participate by signing up at http://lodlampatterns.org or follow along @lodlamp or #lodlamp.

06/5/13

Reconciling Museums Count

From the tweets, it sounds like there were several interesting projects working with the Museums Count data.  Dylan Barrtlet combined the IMLS data with IRS data to clean up some of the address info.   Michael Girarldo mashed up the data with the public library data for a proof-of-concept mobile app that would help users locate museums and libraries in their vicinity.

I’ve continued to play around with this data using OpenRefine and the DBpedia SPARQL endpoint.  Attempting to reconcile against no type, I found approximately 19% of the museums in the IMLS data.

Doing a spot check of the unmatched entities:

  • if it is a simply named entity,  it’s not in Wikipedia/DBpedia
  • it’s an organization that operates a museum of a different name or has a different legal name than the one it’s known by inDBpedia. e.g:
  • if it is a complex name (i.e. dirty IMLS data),  it’s not matched
  • abbreviations in the name that cannot be matched (e.g. Ntnl (national),  Ctr (center), Hist (history/historical), Inc., etc.) or conjunctions/punctuation e.g. ( & for and).
Some of these problems might be fixed by cleaning up the names,  but some of the disambiguation may require human intervention.   I haven’t tried looking up entities in DBpedia by address, but that might also help identify things that are uncertain.
Some of these mismatches do raise interesting ontological questions and gets back to the issue of a Museum (organization) vs. Museum (a place).   It looks like there’s lots of unreconciled houses, historic sites, etc.  with different legal names than the places they are associated with.   What will be the best way to represent/associate these entities?

 Mismatches

Looking at the “matched” items,  I found that lots of generically named museums were matched to a specific museum.  For example, there are many “Art Gallery” things (usually at this or that college) that were all matched to the same DBpedia entity. Likewise, there are about 15 different “Pioneer Museum”, “Museum of Natural History,” “Museum of Anthropology,” “Museum of Art,” “University Museum,” “Cowboy Hall of Fame.”   Another area of mismatch are county/city historical societies where the locality has the same name (i.e 12 different “Douglas County Historical Society” all matched to Douglas County Historical Society in Nebraska.

There are also multiple sites that are maintained by a single entity, like a state museum network or a city.  For example,  The Peale Museum (aka Municipal Museum of Baltimore) and Edgar Allen Poe House and Museum are simply listed here as “City of Baltimore” It was necessary to look up the addresses to see what museum entity was there.

So clearly, a more robust approach to reconciliation is needed, perhaps including the city/state of an entity in order to disambiguate similar names.

Lots of challenges here, but also seems to be lots of opportunities to add to and enrich museum representation in Linked Data/wiki resources.

05/31/13

Quick Museum Counts update

oh well, best laid plans…

Didn’t quite get as far as I’d hoped this week.   Following Justin’s comments in the previous post,  I did a quick mapping to the Organization Ontology,  vCard, wsg84 and schema.org (seems to be the only one with a DUNS property).

A sample in Turtle looks like and you can download the full set as Turtle:

@prefix schema: <http://schema.org/> .
@prefix v: <http://www.w3.org/2006/vcard/ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix org: <http://www.w3.org/ns/org#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
<http://chi.cci.fsu.edu/resource/museum/imls_0> a org:Organization ;
 skos:prefLabel "Carmel Valley Historical Society" ;
 org:hasSite "http://chi.cci.fsu.edu/resource/site/imls_0" .
<http://chi.cci.fsu.edu/site/imls_0> a org:Site ;
 geo:long "-121.726912" ;
 geo:lat "36.47231" ;
 v:street-address "PO BOX 1427" ;
 v:locality "Carmel Valley" ;
 v:region "CA" ;
 v:postal-code "93924" .

Since I’m not aware of an authorized URI for IMLS resources,  I just minted one for my own domain.  However at this time these don’t resolve to anything.   Ideally I might be able to reconcile many of the city/zip codes to a previously published URI for a place.

I’ll be offline for most of tomorrow, but will try to check in the evening.

05/30/13

What is a Library/Archive/Museum According to Linked Data?

In my previous post,  Hacking Museums Count,  I introduced the data that IMLS has released for the National Civic Day of Hacking and took a first stab at assigning LOD properties to the data they provided.     This was based on some of my previous work mapping IMLS DCC data.  Following my initial observations,  I decided to take a closer look at how libraries, archives, and museums are currently modeled in Linked Data resources such as Freebase, DBpedia, and Schema.org.

What is a Library?

“A library…is an organized collection of information resources made accessible to a defined community for reference or borrowing.” (Wikipedia)
  • /organization/local business/library (Schema.org)
  • /organization/educational institution/library (DBpedia)
  • /place/Architectural Structure/Building/library (DBpedia)
  • /architecture/building function/ (Freebase/Wikipedia)
  • /library (Freebase)

What is an Archive?

“An archive is an accumulation of historical records, or the physical place they are located.” (Wikipedia)

  • notably, Schema.org does not have an explicit class for archives
  • /organization/government agency/ (DBpedia)
  • /archive  (dbPedia)   although this class doesn’t seem to be associated with other top-level classes.  Wikipedia/dbPedia resources are given as instances of this class.
  • /organization/organization_type (Freebase)
  • /organization/organization_sector (Freebase)
Best one yet….  /fictional organization type! (Freebase)

What is a Museum?

“A museum is a building or institution dedicated to the acquisition, conservation, study, exhibition, and educational interpretation of objects having scientific, historical, cultural or artistic value.” (Wikipedia).

  • /place/civicStructure/Museum (Schema.org)
  • /place/Architectural Structure/Building/Museum (DBpedia)
  • /architecture/museum (Freebase)

This is only a partial listing of the kinds of thinks a LAM can be according to these linked data sources.  It raises some interesting questions about how to properly model the IMLS data.  What is IMLS’s view on this information?   The emphasis of the total dataset is on location (address, lat/long, Census Block, etc.), so perhaps this is reflected in how LAMs are represented in Linked Data.   I’ll need to go back and revisit my assumptions about LAMs being organizations that are associated with a structure of some kind.

So what?

As we talk about the convergence of LAMs,  the different ways that each sector has been represented as top-level Linked Data classes raises some interesting questions.

  • What does this tell us about public (well…LD public) perception of LAMs?  The narrative definitions on Wikipedia pages seem incongruent with the ontological classes. (is a library a collection? an organization? a building?)
  • Much of our attention as professionals is directed at representing our collections.  What is our responsibility to ensure that LD concepts reflect our understanding of what we do?
  • What is the impact on these specifications on our other Linked Data work?  Given some of the choices that have been made so far,  it could lead to some unexpected inferences:
    • <A Building> <hasCopyright> <An Image>
    • <A Painting> <isPhysicallyLocatedIn> <An Organization>
    • <A Person> <employedBy> <A Place>
The danger here (as with much Linked Data) is that we are talking about a few different entities that overlap.   The Museum (Legal Entity);  The Museum (a building); The Museum (a functional role as an organization that collects stuff).  Perhaps by looking at other kinds of entities that are similar/different to museums (i.e. non-profits, performing arts, businesses, etc.) we can see some alternatives that neatly address this problem.

 

05/28/13

Hacking Museums Count

A few years ago (has it been that long already??) I wrote about the DPLA Beta Sprint we created for the IMLS Digital Collections and Content Project (see: 12, 3). As part of the sprint, I created Linked Data representations for the IMLS DCC Collection-level records. A portion of those records included basic information about contributing IMLS DCC partners. Behind the scenes this data was used to maintain relationships with partners, but we also started using this information to build browse features and  visualizations of what the collection looked like (see the current IMLS DCC interface, my paper on Collections Dashboards).

In the course of this project I discovered a there is a fundamental ontological difference between how museums and libraries are represented in the current Linked Data cloud.   It was pretty easy to reconcile library entities, because the Public Library Survey data had been ingested into Freebase.  Museums were much more hit-or-miss.  Looking closer that the data, I  realized that libraries were usually represented as a kind of organization, but museums were considered a kind of building.  This may be because much of the information about museums in the U.S. is derived from the National Register of Historic Places dataset that was also ingested into dbPedia/Freebase.

As part of the National Civic Day of Hacking, IMLS has issued the Museum Data Challenge.   Included in the challenge is a minimal set of data on 35,000 museums.   I don’t think I’ll be able to participate directly this weekend, so I thought I’d take a look at the data that IMLS has released and see what I can do to make someone else’s hacking easier this weekend. Also included in the IMLS challenge is the Public Library Service Data (and data from the work of my colleague Christie Koonz,  imaplibraries.org).  Also check out the DPLA Challenge  and the Pocket Archivist Mobile Challenge from NARA.

Goals for the week:

  1. Do any clean up needed. (right now the data *looks* pretty clean, but the challenge suggests that their names and geolocation may be faulty).  The fact that much of the LOD about museums is from NRHP might allow me to identify inconsistencies in the IMLS dataset.
  2. Convert the CSV data into Linked Data.
    1. Identify appropriate Linked Data properties for this data (see below).
    2. Transform the CSV into JSON-LD
      1. publish on GitHub
  3. Associate these representations of museums as organizations with representations of museums as buildings in the current Linked Data cloud.
    1. Submit data to dbPedia/Freebase

Here’s a start on identifying LOD properties for the IMLS data release:

IMLS Field Description   LODProperty LOD Comment
id unique identifier   this is just a autogenerated ID number. Unclear whether this has any meaning to IMLS.
name institution name skos:PrefLabel per Organization ontology.  Alternate v:organisation-name
address institution street address v:street-address vCard
city institution city v:locality vCard
state institution state v:region vCard
zip institution zip code v:postal-code vCard
zip4 institution zip+4 v:postal-code vCard
longitude longitude  decimal degree format World Geodetic System Datum 1984 wgs84_pos#lat wgs84
latitude latitude  decimal degree format World Geodetic System Datum 1984 wgs84_pos#long wgs84
phone phone number v:tel vCard
duns DUNS number  Dun & Bradstreet Numeric Identifier org:Identifier there doesn’t seem to be an RDF property for DUNS numbers yet. Is There a better way to differentiate DUNS from EIN?
ein EIN number  Federal Employer Identification Number org:Identifier there doesn’t seem to be an RDF property for EIN numbers yet.

The IMLS data also includes the following fields,  though I haven’t been able to identify any LOD properties for these yet.  This is actually a bit surprising, since you’d think that U.S. Census data (or at least the properties of Census data) would be a solved problem by now.  For the moment, the information above seems like enough of a start, so I’ll leave these aside.


fipst FIPS State code
fipsco FIPS county code
centract seven character Census tract number
cenblock four character Census block number
fipsplc five-digit place FIPS code
fipsmcd five-digit MCD (Minor Civil Division) FIPS code
fipsmsa four-digit MSA (Metropolitan Statistical Area) FIPS code
cbsa five-digit CBSA code that identifies a CBSA area.
metrod five-digit Metropolitan Division Code
microf micropolitan flag
Metropolitan Area or a “1″  indicating a Micropolitan area
mattype geocoding match type

Next up, I’ll discuss in more depth how museums have been modeled in the current Linked Data environment and suggest some possible models for the IMLS dataset.

Next: What is a Library/Archive/Museum According to Linked Data? 

09/21/12

Publishing and Using Linked Data at DHWI

In January I will be conducting a week-long workshop on Publishing and Using Linked data as part of the Digital Humanities Winter Institute at the Maryland Institute for Technology in the Humanities.  Space is still available, so register today!

The publication of structured knowledge representations and open data on the Web opens new possibilities for collaboration among humanities researchers and cultural heritage organizations. This course will introduce participants to the core principles of Linked Open Data (LOD), techniques for building and understanding LOD models, how to locate LOD sources for research, tools for manipulating, visualizing, and integrating available data, and best practice methodologies for publicizing and sharing datasets.

For this course I will be drawing from initial work done by the Learning Linked Data project at the University of Washington iSchool, which has laid out a core inventory of learning topics.  The LOD for Libraries, Archives, and Museums community has also been actively promoting access to increasing amounts of cultural heritage information via Linked Data approaches.  Some of the questions we’ll be exploring in the workshop are:

  • what does the digital humanities community need from linked data
  • what use can we make of these large data sets
  • how we can synchronize scholarly work with the larger linked data community.

To help gain momentum for the workshop, I’ve created a wiki, called Linked Data for Humanities where I will be sharing a drafts of the syllabus, resources, and example humanities projects.   (a big hat tip to Mia Ridge and the Museums and the Machine Processable Web wiki, which has been an important resource for the LODLAM community).   If you have a humanities-based Linked Data project,  questions, comments, or recommendations for things the course should cover, please join in the conversation.

08/7/12

NASASocial Reflections

Last week I had the chance to participate in a #NASASocial event commemorating the 50th Anniversary of the Kennedy Space Center.  The event was timed to kick-off a weekend of NASA Social events leading up to the Mars Science Laboratory (#MSL) landing on August 5/6. 2012.

For me, this was a dream come true. I’ve wanted to visit KSC ever since I stuffed myself with too many Cheerios in order to get a special Space Shuttle kit in the 80s. Sadly, I never made it to a shuttle launch before the program ended. KSC was high on my to-do list since I moved to Florida, so it was exciting to receive the invitation to the #KSC50 event. The NASASocial team hosted us in the KSC press center for two days of presentations about current NASA programs, especially focusing on the Curiosity mission. We also were able to participate in the first multi-Site NASA social event by joining a live simulcast with MSL scientists and engineers at the Jet Propulsion Lab and other NASA Centers that were hosting similar events). The highlight of the event for me was our tour of early launch sites and getting to go inside the Launch Control Center and Vehicle Assembly Building.

www.flickr.com

This was my first “social media event” of this type and its given me a lot to ponder. Interestingly, this felt very different than my use of social media during conferences. When I’m at MCN, MW, etc., etc. I have a pretty good idea of who the audience is for my tweets, but here I felt a little spammy. I’m not sure what you all thought of the stream coming at you last week, but I was trying to be somewhat restrained in crossing the streams. I am also a more casual fan of the space program and rank pretty low on the space geek ladder. After arriving, I’d wished I’d done some more reading up about what’s going on at NASA so I could ask our panelists better questions.

I was a little disappointed that we didn’t hear more history during and event cast as a 50th Anniversary celebration. I’m not sure this is a criticism as much as a surprising mismatch of expectations.  We did get to hear from some NASA old-timers who shared some great personal anecdotes about their time at NASA.  We did get a fat copy of This New Ocean, a part of NASA’s historical publications, but little mention was made of NASA’s other historical collections or efforts to document it’s history.  During the event I started tweeting links to oral histories from some of our speakers (Jay Honeycutt, Lee Solid, Roy Tharpe).  As we went around on our bus tour our guide did point out some landmarks, but only provided a little bit of what I’d call interpretation. Throughout the tour I was pulling up information from Wikipedia and other NASA sites about the locations we were visiting (had I thought about it, I should have looked for any dedicated apps related to KSC.  They do seem to have an official app, but the one review doesn’t make it look worth $.99)   I’d be curious to see what kinds of interpretation is offered on the public tour that covers the same area.

My other takeaway from this event, is that I need to work at being social at social events.  I’m usually pretty good in a crowd of people I know, but still shy among strangers.  I sense there was some un-official back channel that I might have tuned into if I’d been a little more aggressive about talking to other attendees.   The organizers seemed to leave this part of being social to us and it has me wondering what impact “icebreakers,” etc. have on these kinds of events.  Compared to my conference experiences, I didn’t see as much direct back-and-forth on Twitter among participants (at least using the NASASocial hashtag).  Again, as a n00b, I may have been missing out on something (and ugh! I could never get the wifi to work right, so I was limited to my phone – regretting my wifi-only iPad this weekend).   From NASA’s perspective, I’m guessing that the events were successful.  The NASASocial tag   trended in the US on both days and seemed to feed the buzz leading up to the landing.

Big thanks to NASASocial for letting me come aboard for this event. It was a great opportunity to learn more about KSC and the MSL Mission.  Since it was my first time participating in an event like this, it also has me thinking hard about how museums can use social media in this way to engage their audiences.  It’s going to provide a great example for my students when we discuss social media and museums later this fall.

(and yes, it’s been a while. Do we really need another “I haven’t been blogging for a while post…I don’t think so!)