05/9/11

On the ways

Tugboat "Neptune" under construction at Jackson and Sharp

Tugboat "Neptune" under construction at Jackson and Sharp. Jackson and Sharp Collection. Courtesy Delaware Public Archives

This will be a short post this week as I’ve decided to use my “study night” to dip my toes into LaTeX tonight (exploring actual dissertation production workflows, weee!).

ways (n.): df. structure consisting of a sloping way down to the water from the place where ships are built or repaired

After creating records for various agents involved in my data,  there remains the data about the vessels themselves.    Again sticking to a pragmatic exploration, I’ll be using the Freebase Ship schema for this data.  Thankfully many of the properties listed here are properties that I already have in my database. I’m still working with Google Regine to clean up my data, but here is how the properties will map.

My Data Freebase Property
ShipName type.object.name
HullNo e.g. DE 107 and/or a particular ID assigned by a shipyard.  (see the Hagley photos for examples at P&J.  I don’t see a Freebase property for this field in my data.
ShipType boats/ship_class/ship_type
HullType (boats/hull_configuration?)
there are no instances of this property in Freebase
ShipPower boats/ship/means_of_propulsion
LOA boats/ship/length_overall
Beam boats/ship/beam
Displace boats/ship/displacement
Tonnage (same as displacement?) hmm..
Draft boats/ship/draught
LaunchDate boat/ship/launched
Fate boat/boat_fate
Yard boats/ship/ship_builder
Designer boats/ship/designer
Owner boats/ship/owners

There seem to be some properties that are part of different, but related Freebase schemas (e.g. boats/ship/displacement and boats/ship_class/displacement_tons) that I need to sort out.  There are also other Freebase properties that don’t map directly to columns in my table (e.g. notableFor),  but may be useful for adding some of the stories around vessels found in my book , a comments field with general notes (and in the banker’s box of research notes that haven’t seen the light of day).

I see another chicken/egg problem looming.  While I have most of the shipyards in my previously created corporate.rdf file and individual shipwrights will follow in people.rdf, I also need to add the individuals/corporations who are owners/agents.  In an early post @jonvoss pointed me towards a Google Refine plugin to rectify linked data.  This seems like a good place to try and deploy this at scale. (hand-crafting a few records for the yards wasn’t too hard,  but there are hundreds of owners/buyers, etc.).  I remain a little skeptical that this will work well.  From what I’ve seen so far of Freebase,  big popular things are represented  but things in the long tail are not. (there’s a research question in there somewhere).

I am also considering using an opaque URI for these vessels, perhaps based on an auto-generated ID number.  The ship names in my database overlap quite a bit, making using names directly in URIs a little dangerous (the same can probably be said for people/corporate names at a global scale – not so much for my limited set).   It may also be possible to use a combination of fields to generate opaque URIs (for example, see Styles, Ayers & Shabir (2008) Semantic MARC, MARC21 and the Semantic Web. LDOW2008.)