2010-07-01

What 'mean' means

I've been working for a couple of months now with Gerard de Melo at Lexvo.org. The first objective was to make an example of Linked Data both social and technical good practice. If you have published a set of URIs, and find out afterwards that another set for the same resources has better quality, and moreover you have not the bandwidth or resources to maintain your dataset, what should you do? The example at hand was to redirect the work I've been doing at lingvoj.org towards the data at Lexvo.org which are far more complete, and moreover integrated in a general approach which I found extremely interesting.
The neat result of this work so far is that URIs for languages at lingvoj.org are now redirecting seamlessly to matching lexvo.org URIs, see e.g., http://www.lingvoj.org/lang/fr.

En passant I had fruitful exchanges with Gerard and brought little contributions, linking Lexvo.org resources to a couple of published vocabularies, such as LCSH and RAMEAU and other miscellaneous suggestions, acknowledged on the freshly updated Lexvo.org home page.

This new update, and the announcement Gerard will certainly push to the Semantic Web community in the next hours or days, is just on-time. Lexvo.org semiotic approach on lexical resources is a nice workaround to the RDF issue of 'literals as subjects' a topic which is again putting fire to the semantic Web mailing list. Lexvo.org FAQ explain very neatly why and how to coin URIs for terms in a specific language. So if the use of the RDF literal 'mean'@en as subject in RDF triples seems indeed problematic, the URI http://lexvo.org/id/term/eng/mean identifies this literal (a sign) in a non-ambiguous way, and allows it to be used as either subject or object of a triple in any current and hopefully any future form of RDF, without any technical or philosophical question.

I would like to stress a couple of very nice features allowed by the semiotic approach of Lexvo.org.

First you don't need to know if a term has already been described in the Lexvo.org data base to coin a URI for it. Try http://lexvo.org/id/term/eng/twidget (or for that matter any term that comes out of your hat). The URI will serve you at least the semantics you have implicitly embedded in its structure. This URI represents the term in english language of which literal form is 'twidget'. If there is no other assertions, it's because Lexvo.org data base is not aware of any other meaning of this term, nor translation in any other language.
This is more clever as it might seem at first sight. It means you can identify blindly in your own data any term you use by a lexvo.org URI. Maybe the service provides extra information on the term, maybe not. Maybe not today, but tomorrow if you ping lexvo.org saying "hey, add those URIs descriptions to your data base please".

The ambiguity of homographs is exposed but not resolved in the context of a language. http://lexvo.org/id/term/eng/mean provides the various meanings of the term in english (both verb and adjective). But cross-lingual homographs are distinct resources, such as http://lexvo.org/id/term/eng/coin and http://lexvo.org/id/term/fra/coin.

In a nutshell, Lexvo.org is an outstanding data set and service which deserves better visibility and widespread use in the Linked Data Cloud, providing a lexical and semiotic glue bearing a potentially enormous added value. A lot can be built on top of this. Whether or not literals as subjects eventually win their first-class RDF citizenship.