05/17/11

On the ways (Part II)

"Galen L. Stone" Interior view, ribbing of tug under construction.  Delaware Public Archives

"Galen L. Stone" Interior view, ribbing of tug under construction. Delaware Public Archives

Tonight I decided to go back to my large table of >1,000 ships and continue doing some clean-up.   However, instead of trying to edit the values I had I gave the Google Refine/Freebase reconciliation service a try.  Boy-howdy I really should have taken @jonvoss’s advice and done this sooner.   I pretty quickly whipped through my Ship Type column and matched to the /boats/ship/ship_type vocabulary in Freebase.   Like any classification task, I think some of the subtleties of my data get lost. For the moment I think that’s OK, but if you care about the difference between a sidewheel paddle steamer and a sternwheel paddle steamer they’ve been lumped together under the same class.

The reconciliation tool made quick work of matching to the companies that make up the majority of the shipyards in my database, which are mostly the late 19th and 20th century yards.   There’s not much of a record about individual vessels from the earlier yards.   In the few cases where there wasn’t a match, I asked Freebase to create a new topic.  (I’ll go back later and see how this populates Freebase itself.)   I also did this for the Owners column, which was able to match a smaller number of organizations and people.  I don’t know whether I’m being a little cavalier about making new topics on Freebase,  but this seems like the easiest thing to do. (I should probably be keeping better notes about what new things I’m creating – it would nice to get some sort of report/e-mail with all those things listed). The latter part has been slow going due to a bug in Refine that takes you back to the first row after reconciling a row that may be deep in your data. It helps to select the (none) facet that removes rows for which judgements have been assigned and use additional facets to narrow things down.  While I’ve cut this list of owners down significantly I’m still looking at a long-tail of about 400 unmatched entities. (many are individuals who’s first names are abbreviated – with a little googling I can find many of them and expand the names).

A resource that is proving useful to double-check my work is Shipbuilding History that includes lists of vessels from the Wilmington yards. Tim seems to have collected some information I’m missing, so I’m thinking about the best way to reconcile his information with mine.  There are other lists of vessels that are currently not linked data, but are large tables on the web.  Perhaps a screenscraper might make quick work of turning those into linked data graphs that can be merged with my graphs.

But I think I’ve hit my limit for tonight. (I’ve been grading all day, so more than 14 hrs of staring at a screen is probably enough – time to hit my bunk).

On the Ways (Part I)

05/11/11

Birdseye of Wilmington, DE

Just a fun aside,  from the BigMaps blog.  Bummer I don’t see a way to embed the zoomable version here.    Pointing out to these kinds of things from my RDF is one of the longer term goals of my exploration.  Take a ride along the waterfront to see the various ships under construction. (wonder if I can infer from my data which ones those might be?)

04/25/11

Of Ships and Men (Part 2)

Receipt, E. I. du Pont de Nemours and Company to William Woodcock, 1806-05-22

DuPont Collection. Hagley Museum and Library

I started out tonight with “How to Publish Linked Data on the Web” and learned that it has been superceded by a new book that promises updated information:  Linked Data: Evolving the Web into a Global Data Space.

Since the last time, I’ve decided to publishing data on my own website – at least until I’ve gotten the feel for all of this and how it will fit together across all of my data.  Once that’s done I’ll consider contributing it to a resource like Freebase since they seem to have a simple import feature. Previously I’d setup a subdomain on my sandbox server (http://wilmingtonships.richardjurban.net).

For the moment I’m going to keep things simple by just using a /resource subdirectory to store my static RDF.  As this project grows, I’ll see whether this works (it is a relatively small data set) or whether a more robust solution is needed.

Here are the first two linked data graphs for this project, representing two of the earliest shipwrights in Wilmington:  William Woodcock, and his son William Woodcock, Jr.

Off to a good start, but already alot of questions.  I’m using Freebase schemas and properties – working a little bit from examples of existing people.  Naturally dbPedia representations are different (metadata standards are like toothbrushes after all), but presumably there is some RDFS somewhere that connects Freebase and dbPedia properties.  Shelved until later.

While this was a quick way to whip up some examples,  I was struggling to grok the RDF for Henry Ford by just looking at it.  Silly wabbit,  triples are for computers.   Loading it into something that gives a more human-friendly presentation is really helpful.  For example, just using the W3C RDF Validation service made the RDF for Henry Ford more understandable.

I did mix in an RDFS Comment with a longer textual description based on some RDF I retrieved from dbPedia.  These don’t seem to be in Freebase output,  so I’m not sure what the general principles of mixing and matching like this will be.  (namespaces, sure, no problem – but is there an affordance to sticking with one schema/format?)

Of Ships and Men:  Part 1 | Part 2 |

03/28/11

Scanning the Horizon

Before I get started building my own data for this project it seems like it would be useful to see what linked data is already available and what kinds of properties are being assigned to each of the entities I’ve identified. To start I’ve only looked at dbPedia, although it also includes some links to Freebase. I would be interested to hear if there are other common ways to do some due diligence in the growing LOD cloud before creating new contributions.

People

I didn’t find any of the key players in Wikipedia or dbPedia,  although they may be mentioned in the articles for shipyards below. In addition to FOAF, dbPedia has additional properties for relationships between people and companies. (e.g. see Andrew Carnegie)

Companies

Most of the big yards are represented, but all of the smaller, earlier shipyards are absent.  I’ll probably start with these existing records, making sure the companies are equally represented and work on some of the other yards later.

Bethlehem, Dravo and ACF were very large corporations with many divisions and multiple shipyards.  These descriptions point to the larger entity, but not to the subdivision. Perhaps a little archival context needed here?

Vessels

This is just a small sample (for more, see  Ships Built in Delaware), but a useful example of the properties represented in dbPedia (vessels are in the Ships class, but the properties don’t seem to be explicitly associated with that class. I’ll need to do some more digging into how the classes/properties are defined here). Fortunately, my database has many of the same properties, which should make mapping it easy. I’m currently working with Google Refine to clean it up what I have.

I am confused about how these graphs link to the graphs for companies above.  For example,  the description of the U.S.S. Louisiana includes a link to Harlan & Hollingsworth in its Wikipedia Infobox but the dbPedia entry just has a literal.

Events

There are properties for both companies and vessels that represent events (like founding, launching, decommissioning, etc.).  There are also Wikipedia categories such as Companies Founded in 1899 which seem a little redundant if you have formated information (though the “Ships Built in Delaware” is useful, seems like there should be another way to do this).

Locations

While the records above indicate the companies were located in Wilmington, DE,  none of them have a specific geolocation.  I whipped up a quick Google Map based on my original fly-leaf illustration. This was a quick start, I’ll need to translate these into latitude & longitude to add to the descriptions. I don’t know whether it’s possible to use other shapes to indicate the full extent to some of the yards that stretched along the waterfront.


View Wilmington Shipyards in a larger map