Sunday, January 8, 2012

"To-day we have naming of parts"

I like to explore associations people share with me when I describe something new to them. So, I was pleased the other day when a colleague in the UK shared the phrase that did stick in her head when I described the idea of URI:s (Uniform Resource Identifiers) and Linked Data: "To-day we have naming of parts".

So, after some googling I found the poem by Henry Reed (1914-1986), "Naming of Parts." New Statesman and Nation 24, no. 598 (8 August 1942).

To-day we have naming of parts. Yesterday,
We had daily cleaning. And to-morrow morning,
We shall have what to do after firing. But to-day,
To-day we have naming of parts. Japonica
Glistens like coral in all of the neighboring gardens,
     And to-day we have naming of parts.

Hear Henry Reed and Frank Duncan read "Naming of parts" (mp3)


Through a very nice website; The Poetry of Henry Reed, I learned more about this World War II British poet, critic, translator, and radio dramatist. It helped me to better understand this wonderful, and sad, poem about the contrast between the world of weapons and the world of nature. 

Naming parts and other things
I also learned about an article (DOI:10.1038/nbt0102-27) in Nature BioTechnology (2002) using the first stanza in Henry Reeds' famous poem as its title. In the article a professor of genomics at the University of Manchester describes the identification of previously non-annotated genes in yeast.

And, I also found a blog post from 2009 that also used the first stanza in Henry Reed's poem in its title:
Naming of parts and other things. That is, David Bawken's (@David_Bawden) post on his nice blog: "The Occasional Informationist, irregular thoughts on the information sciences". In this post he describes a meeting with John Wilbanks (@wilbanks) at the British Library:
In his presentation of the need for annotation of digital reporting of scientific findings, Wilbanks commented simply that we need to call the same thing by the same name; this makes possible the semantic linking of information and data, the creation of ontologies, and so on, without which it will not be possible to share information across disciplinary and sub-disciplinary silos. 
He exemplified this by examples by simple – the various names for coffee in different languages – and complex – the variant terminology used in hundreds of datasets relating to polar climate change, and in over a thousand related to genomics.
There was another aspect to this point. What we call an information object in the digital world – DOIs and all the rest – is also fundamental; if we do not call these digital objects the same thing, we will have great difficulty in finding them.

Names of today
So, let me conclude this post with a couple of examples of naming parts and other things using names of today that is http-based URI:s. The three example URI:s are also three examples of large efforts to publishing linked data "about the named things":

  1. British Library's URI for the poet Henry Reed
    http://bnb.data.bl.uk/id/person/ReedHenry1914-1986
  2. Wikipedia's, i.e. DBpedia's, URI for the poet Henry Reed
    http://dbpedia.org/resource/Henry_Reed_%28poet%29
  3. The DOI for the the article about identifying genes in yeast turned into a URI by CrossRef
    http://dx.doi.org/10.1038/nbt0102-27

1. British Library publish metadata about bibliographic resources ("things") using Linked Data techniques and technologies. And part of that is to assign http-based URI:s to the creators. For a great introduction to the underlying model see the blog post: British Library Data Model: Overview by Tim Hodson (@timhodson).

So, for example the data model specifies that persons who are the identified creators of bibliographic resources, such as the poet Henry Reed (http://bnb.data.bl.uk/id/person/ReedHenry1914-1986), should be of the type Agent and Person according to the basic, and very often used vocabulary for linked data, called Friend of a Friend (FOAF).


2. A large part of the structured content published on Wikipedia pages is also made available as linked data called DBpedia. See this great article: How DBpedia Treats Wikipedia as a Database. The so called resources ("things") that the wikipedia pages describes are in DBpedia given http-based URI:s and each resource are typified using a thin model called the DBpedia ontology. 

So, here we can see that the poet Henry Reed is also identified in DBpedia (http://dbpedia.org/resource/Henry_Reed_%28poet%29) and described with the structured data from the Wikipedia page about him. Such as his birth date and death date, and also the fact that he is categorized using the concept 'English poets'. This concept also has a URI http://dbpedia.org/resource/Category:English_poets. So, we may have more than one URI for the same Henry Reed. These can be related to each other using the sameAs statement.


This is not yet done by the British Library, but I assume this will be done later as for example the Swedish Library catalogue relates their URI:s to DBpedia's.

Here is another URI, http://dbpedia.org/resource/Category:Firearm_components, for a categorization concept, and in the DBpedia interface you can see of list such resources ("things") and links to them using URI:s such as http://dbpedia.org/resource/Sling_%28firearms%29.


3. CrossRef has made metadata for 46 million Digital Object Identifiers (DOI) available as Linked Data. DOIs are used for publishing of uniquely identify electronic documents (largely scholarly journal articles). CrossRef is a consortium of roughly 3,000 publishers, and is a big player in the academic publishing marketplace.

So, here is the identifier of the article about identifying genes in yeast http://dx.doi.org/10.1038/nbt0102-27.

Kudos to my colleague for the opportunity for me to learn more this wonderful poem and for a great discussion.
To ReedingLessons the signature behind the great website about Henry Reed.
To @David_Bawden for his niceblog 
The Occasional Informationist.
And, finally, to @wilbanks a great source of inspiration.

6 comments:

John S Erickson PhD said...

Great post! Please note that there is a (subtle) error w.r.t. DBPedia URIs.

Technically, the DBPedia URI for a thing is of the form href="http://dbpedia.org/resource/Sling_%28firearms%29 whereas the human-readable manifestation of that URI is href="http://dbpedia.org/resource/Sling_%28firearms%29

If you attempt to browse to the first, you will be redirected to the second. If you however use conneg (e.g. using curl) to request RDF for the first, you'll get it:


curl -H "Accept: application/rdf+xml" http://dbpedia.org/resource/Sling_%28firearms%29 -L

John S Erickson PhD said...

Ooops, I meant, "...whereas the human-readable manifestation of that URI is http://dbpedia.org/page/Sling_%28firearms%29

Sorry :P

Unknown said...

Ah, of course. Copy/Paste error. Thank you John!

Tim Hodson said...

As I have said before, the use of URIs to name things is pretty fundamental to the Web of Data, or Linked Data, or even the Semantic Web (whatever that really is!).

The key thing is that I can say things about your URIs and you can say things about my URIs.

There is a debate around whether we create our own URIs for things that you have also got URIs for and use owl:sameAs to link them, or whether we should just use whichever URI was created first to talk about the same thing.

In terms of making the web of data mix together easily, the second approach of URI reuse is the winner. However, linksets that consist of statements like < a > ex:hasSomeRelationTo < b > can also be useful to glue two (or more) datasets together. Linksets will have an additional querying overhead as you need to build queries that use those relationships.

Anonymous said...

I think we need to live with URI proliferation. Some people will use existing URIs (mostly likely the ones from trustworthy sources) and others will mint their own. There are arguments for both approaches, and both approaches are widely applied already. Some people will make the sameAs assertions, but in general there will be a need for services and tools to find all the URIs given to a thing. Nothing new: every one of use probably already has multiple identifiers in the real world, and it will not be different in the virtual world.

Unknown said...

Agree with Makx Dekkers (anonym / anonymous) recent comment. Methinks: Anticipate more than one URI for the same 'thing'. At the same time, hope for as few as possible over time.