2012-03-27

Beyond httpRange-14 addiction

After years of armed peace we get another round of this old debate. Everybody says it's enough and they don't want to hear about it any more but nobody can resist the temptation to jump in once again. Trying to follow the various running threads about it, with cross-postings and forkings is a pain. Nevertheless I added my pinch of salt yesterday, without much feedback so far, except from Mike Bergman who in a personal answer suggests I should turn this into (yet another) formal change proposal. Suggestion much appreciated, coming from a man with one of the deepest understanding of the issue, and thinking along lines similar to the ones developed here for years. That said,  not sure I want to add to the noise, and not sure such a proposal would gain much traction. Moreover the deadline of the call for proposals is now pretty close (only two days). And actually what I have in mind currently would not amend the current "httpRange-14 resolution", but proposes a radical move to escape the non-issues net in which it is entangled.
Anyway, I will try to formalize here a bit of this non-proposal. In a nutshell it gets rid of the information resource vs non-information resource distinction, and decouples the URI referent (what the URI-as-name represents, denotes, means ... pick your choice) from the type of answer to an HTTP GET request on this URI. Basically it follows the tracks of the famous Pat Hayes' paper "In Defence of Ambiguity" I'd already pushed here back in 2006.
The rationale of the proposal is that a URI does not name anything if its owner does not explicitly provide some indication of what this thing is. In other words, a URI is a name insofar as its owner has explicitly declared what the referent of this name is

We could stop here, declare victory and retire, but let's try to put up more explicit definitions. Hereafter URI means HTTP URI in the very broad sense of the httpRange-14 context, which means it can certainly be extended to HTTPS and IRI.
  • A resource description is any information in any convenient language or format making as explicit as possible the referent of a URI.
  • A resource definition is a resource description provided by the URI owner in answer to an HTTP GET request on this URI.
Thanks to the Web architecture, answer to the HTTP GET request depends on server configuration and  client-server content negotiation. It enables the URI owner to provide several resource definitions for the same URI using various formats, e.g., natural language description for human consumers, specific HTML markup, formal RDF declarations, embedded metadata in multimedia files etc. The above definition does not put any limit either to the formats and languages in which resource definitions can be expressed, or to mechanisms (redirection, content negotiation) used to serve those definitions to the client. It's in the nature of the Web technologies that both formats and mechanisms are bound to change over time, without changing the above general notion of resource definition.

In order for things to run smoothly on the Web, the definition goes with the following recommendations :

The URI owner should ensure as far as possible that the various resource definitions served for a given URI at any given point in time are consistent with each other, and that the referent which is made explicit in those definitions does not change over time, even if the format or content of the definition change, for example to improve the accuracy of the definition or get rid of discovered ambiguities. 
Semantic applications using formal languages such as RDF should rely on formal resource definitions in this language, if available, as a primary source of the URI semantics. Default any such resource definition, they can rely first on other formal resource descriptions provided by the URI owner. In the absence of any resource definition or description provided by the URI owner, the URI semantics is let to everyone's guess. Third parties can of course provide resource descriptions for URI they do not own, the reliability of such descriptions might of course vary. 

An important side effect of above definitions is that a URI for which neither definition nor description is available does not name anything. Such a URI can work well as far as Web access is concerned, useful information retrieved from this URI through HTTP, including maybe description of other resources, but nothing in this infomation enables to say what this URI represents. And this is perfectly OK for over 99.9% of Web URIs, and it does not break anything to the Web. But this provides a clear borderline to where the Semantic Web starts. The Semantic Web is made of those URIs for which at least a resource description, and preferably a resource definition, is available.

Unfortunately this side effect is maybe too radical for the Web community at large, and for the TAG in particular, to even consider such a proposal. So I let it here as food for thought and for the record, waiting for another round of the story. Maybe next time things will be ripe enough to push such a proposal on track.

[2012-04-03] : Further recommended reading by Mike Bergman : Tortured Terminology and Problematic Prescriptions.