Hacking Museums Count
A few years ago (has it been that long already??) I wrote about the DPLA Beta Sprint we created for the IMLS Digital Collections and Content Project (see: 1, 2, 3). As part of the sprint, I created Linked Data representations for the IMLS DCC Collection-level records. A portion of those records included basic information about contributing IMLS DCC partners. Behind the scenes this data was used to maintain relationships with partners, but we also started using this information to build browse features and visualizations of what the collection looked like (see the current IMLS DCC interface, my paper on Collections Dashboards).
In the course of this project I discovered a there is a fundamental ontological difference between how museums and libraries are represented in the current Linked Data cloud. It was pretty easy to reconcile library entities, because the Public Library Survey data had been ingested into Freebase. Museums were much more hit-or-miss. Looking closer that the data, I realized that libraries were usually represented as a kind of organization, but museums were considered a kind of building. This may be because much of the information about museums in the U.S. is derived from the National Register of Historic Places dataset that was also ingested into dbPedia/Freebase.
As part of the National Civic Day of Hacking, IMLS has issued the Museum Data Challenge. Included in the challenge is a minimal set of data on 35,000 museums. I don’t think I’ll be able to participate directly this weekend, so I thought I’d take a look at the data that IMLS has released and see what I can do to make someone else’s hacking easier this weekend. Also included in the IMLS challenge is the Public Library Service Data (and data from the work of my colleague Christie Koonz, imaplibraries.org). Also check out the DPLA Challenge and the Pocket Archivist Mobile Challenge from NARA.
Goals for the week:
- Do any clean up needed. (right now the data *looks* pretty clean, but the challenge suggests that their names and geolocation may be faulty). The fact that much of the LOD about museums is from NRHP might allow me to identify inconsistencies in the IMLS dataset.
- Convert the CSV data into Linked Data.
- Identify appropriate Linked Data properties for this data (see below).
- Transform the CSV into JSON-LD
- publish on GitHub
- Associate these representations of museums as organizations with representations of museums as buildings in the current Linked Data cloud.
- Submit data to dbPedia/Freebase
Here’s a start on identifying LOD properties for the IMLS data release:
| IMLS Field | Description | LODProperty | LOD Comment | |
| id | unique identifier | this is just a autogenerated ID number. Unclear whether this has any meaning to IMLS. | ||
| name | institution name | skos:PrefLabel | per Organization ontology. Alternate v:organisation-name | |
| address | institution street address | v:street-address | vCard | |
| city | institution city | v:locality | vCard | |
| state | institution state | v:region | vCard | |
| zip | institution zip code | v:postal-code | vCard | |
| zip4 | institution zip+4 | v:postal-code | vCard | |
| longitude | longitude | decimal degree format World Geodetic System Datum 1984 | wgs84_pos#lat | wgs84 |
| latitude | latitude | decimal degree format World Geodetic System Datum 1984 | wgs84_pos#long | wgs84 |
| phone | phone number | v:tel | vCard | |
| duns | DUNS number | Dun & Bradstreet Numeric Identifier | org:Identifier | there doesn’t seem to be an RDF property for DUNS numbers yet. Is There a better way to differentiate DUNS from EIN? |
| ein | EIN number | Federal Employer Identification Number | org:Identifier | there doesn’t seem to be an RDF property for EIN numbers yet. |
The IMLS data also includes the following fields, though I haven’t been able to identify any LOD properties for these yet. This is actually a bit surprising, since you’d think that U.S. Census data (or at least the properties of Census data) would be a solved problem by now. For the moment, the information above seems like enough of a start, so I’ll leave these aside.
| fipst | FIPS State code |
| fipsco | FIPS county code |
| centract | seven character Census tract number |
| cenblock | four character Census block number |
| fipsplc | five-digit place FIPS code |
| fipsmcd | five-digit MCD (Minor Civil Division) FIPS code |
| fipsmsa | four-digit MSA (Metropolitan Statistical Area) FIPS code |
| cbsa | five-digit CBSA code that identifies a CBSA area. |
| metrod | five-digit Metropolitan Division Code |
| microf | micropolitan flag |
| Metropolitan Area or a “1″ | indicating a Micropolitan area |
| mattype | geocoding match type |
Next up, I’ll discuss in more depth how museums have been modeled in the current Linked Data environment and suggest some possible models for the IMLS dataset.
Next: What is a Library/Archive/Museum According to Linked Data?

