2014-04-08

Query + Entity = Quentity

Neologisms are cool, particularly those of the portmanteau kind. Taking two old words and biding them together into a new hybrid semantic species is indeed as exciting, tricky and risky as tree grafting. And it takes some years, either for words or for trees, to figure out success or failure. Will you eventually harvest any fruit, will the hybrid survive at all? Nine years ago I introduced hubjects, and four years before that it was the semantopic map. Neither of those have grown in the expected direction or yielded the expected fruits, although they are both alive and well. Those poor results will not prevent me to try a new grafting experience with quentities, and let's meet here after 2020 under this new tree, and enjoy the fruits, if any.
So, what is this new semantic graft all about? I've ranted here last year about Google not exposing public linked data URIs for its Knowledge Graph entities, and defining linked entities jus as yet other queries. A similar criticism applies to the Bing version of the Knowledge Graph I just invited yesterday to play in this blog. But thinking twice about it, I wonder now if queries are not the right way, and maybe the best way, to consider entities in the Web of data. After all, many (most) URIs in the linked data cloud actually resolve to a query behind the scene, even if they look like plain vanilla URIs. URIs at DBpedia, Freebase, VIAF, WorldCat, OBO, Geonames (just to name a few) are deferenced through some query on some data base, which might be or not a SPARQL query on a triple store. 

Let's take this query which you can pass to the DBpedia SPARQL endpoint.
SELECT DISTINCT ?mag ?quake
WHERE
{
?quake  dbpprop:magnitude  ?mag
FILTER (contains(str(?quake), "earthquake"))
FILTER (contains(str(?quake), "/20"))
FILTER (?mag > 7)
FILTER (?mag < 10)
}
ORDER BY DESC(?mag)
I've tweaked the filters in order to cope with the quite messy state of earthquakes data in DBpedia : no single class nor category for earthquakes, no consolidation of datatype in the values of magnitude (hence the max value filter), date absent or in weird formats, but fortunately quite standard URI fragment syntax (every 21st century earthquake has a URI starting with http://dbpedia.org/resource/20 and containing "earthquake"). Default explicit semantic filters, use syntactic ones ... if you know the implicit semantics of the syntax, of course.
Granted, this query is as ugly as the data themselves are, but the result is not that bad and one could proudly call this "List of major earthquakes in the 21st century, sorted by decreasing magnitude".

Now I've encapsulated the query on DBpedia endpoint into a tiny URI. Does http://bit.ly/1lNkb0R qualify as a URI for an entity in the common meaning of "named entity"? One can argue forever to know if that "List of major earthquakes in the 21st century" is or is not an entity, but in my opinion it is one, no more no less than every individual earthquake in that list (the ontological status of an individual earthquake is a tricky one, too, if you think about it). 
One can argue also that this entity is a shaky one, because the result of this query is bound to change. The current list in DBpedia might be inaccurate or incomplete, some instances might escape the filter for all sort of obvious reasons, and obviously new major earthquakes are bound to happen in this century. Moreover, a stable meaning for this URI depends on the availability and stability of the bit.ly redirection service, on the availability of the DBpedia SPARQL endpoint, on the stability of Wikipedia URI policy and DBpedia ontology. Given all those particularities, let's assume we have a new kind of entity, defined by a query, that I propose to call for this very reason a quentity (shortcut for query entity), and an associated URI which I would gladly call a qURI (shortcut for query URI).

This qURI of course makes sense only as long as the technical context in which you can perform it is available. But is it different for any other kind of URI? To figure what a URI means in the Web of data, you have to use a HTTP GET, which is nothing more than a query over the global data base which is the Web, and what you GET generally depends on the availability and stability of as many pieces of hardware, software and data as in the above example. 
Indeed any URI can be seen, no more no less than the above bit.ly one, as an encapsulated query, making sense only when it's launched across the Web. And is not the elusive entity represented (identified) by this URI better seen as the query itself rather than as the query result? The query is permanent, whereas the query result is dependent on the everchanging client-server conversation context. 

So, if you want some kind of permanence in what the URI defines or identifies or represents (pick your choice), look at the query itself, not at the result. If you abide by this strange but inescapable conclusion, every entity in the Web is a quentity, and its URI is a qURI.

Follow-up discussion on Google+

Added 2014-04-09 : In the G+ discussion is introduced another and certainly better example to make my point : http://bit.ly/R2e3VV, a SPARQL CONSTRUCT yielding the same list in n3, making clear that the RDF description one GET from this URI does not, and cannot, include any triple describing the URI itself.