2015-02-26

Statements are only statements

A few days ago in the comments of this post by +Teodora Petkova on Google+ I promised to +Aaron Bradley a post explaining why I am uneasy with the reference to things in Tim Berners-Lee's reference document defining (in 2006) Linked Data. The challenge was to make it readable by seven-years old kids or marketers, but I'm not sure the following meets this requirement.

When Google launched its Knowledge Graph (in 2012) with the tagline things, not strings, it was not much more than the principles of Linked Data as exposed in the above said document six years before, but implemented as a Google enclosure of mostly public source data, with neither API nor even public reusable URIs. I ranted here about that, and nothing seems to have changed since for that matter.
But something important I missed at the time is a subtle drift between TBL's prose and Google's one. The former speaks about things and information about those things. The latter starts by using also the term information, but switches rapidly to objects and facts.
[The Knowledge Graph] currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects.
The document uses "thing", "entity" and "object" at various places as apparent broad synonyms, conveying (maybe unwillingly) the (very naive) notion that the Knowledge Graph stands at a neat projection in data of "real-world" well-defined things-entities-objects and proven (true) facts about those. An impression reinforced by the use of expressions such as "Find the right thing". And actually, that's how most people are ready to buy it, "Don't be evil" implies "Don't lie, just facts". In a nutshell, if you want to know (true, proven, quality checked) facts about things, just ask Google. It's used to be just ask Wikipedia, but since the Knowledge Graph taps on Wikipedia, it inherits the trust in its source. But similarly naive presentations can be found here and there uttered by enthusiastic Linked Data supporters. Granted, TBL's discourse avoids reference to "facts", but does not close the door, and by this opening a pervasive neo-platonician view of the world has engulfed. There are things and facts outhere, just represent them on the Web using URIs and RDF, et voilà. The DBpedia Knowledge Base description contains such typical sentences blurring the ontological status of what is described.
All these [DBpedia] versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia.
It's let to everyone's guess to figure what "existence in the English version" can mean for a thing. What should such documents say instead of "things" and "facts" to avoid such a confusion? Simply what they are, data bases of statements using names (URIs) and sentences (RDF triples) which just copy, translate, adapt, in one word re-present on the Web statements already present in documents and data, in a variety of more or less natural, structured, formal, shared, idiomatic languages. As often stressed here (for five years at least), this representation is just another translation.
And, as for any kind of statements in any language, to figure whether you can trust them or not, you should be able to track their provenance, the context and time of their utterance. That's for example how Wikidata is intended to work. Look at the image below, nothing like a real-world thing or fact is mentioned, but a statement with its claim and context.
The question of the relationship of names and statements with any real-world referents is a deep question open by philosophers for ages, and which should certainly remain open. Or in any case the Web, Linked Data and the Knowledge Graph do not, will not, and should not insidiously, or even with no evil in mind, pretend to close it. Those technologies just provide incredibly efficient ways to exchange, link, access, share statements, based on Web architecture and a minimalist standard grammar. Which is indeed a great achievement, no less, but no more. At the end of the day, data are only data, statements are only statements.

2015-02-23

Common names, proper usage

What follows might be, as previous posts, relevant to the raging debate in and around the W3C Shapes Working Group. If you don't care too much about Latin, Greek, French, German, etymology, translation and languages at large, you can go straight to the last paragraph. But I trust my faithful readers (whoever they are) to follow me through the long preliminary linguistic meanders.

I had a while ago pointed at the enclosure of common names as trademarks. Maybe I should have written common nouns. But in French (my native language), there is a single word nom to translate both noun and name, all being cognates to Latin nomen, Greek ὄνομα, and many more avatars of the same Indo-European root. In French grammar you will say "nom commun" for "common noun" and "nom propre" for "proper noun", and a French native speaker is likely to translate in English "common name" and "proper name", both ambiguous out of context. And my purpose today is indeed to look at what it can mean for names to be common or proper beyond what it means for grammatical nouns.
Let's look into Latin again, where communis and proprius, as well as their ancient Greek equivalents κοινός and ἴδιος have roughly the semantic scope they have kept in French and English. Together they split the world into what belongs to the commons and what is proprietary or private. Beyond and before use in grammar to denote universals and particulars, further meanings have built upon good or bad characteristics associated with each term. Typically, "common" will be used as a derogatory qualifier for whatever belongs to the vulgum pecus, those common people which do not behave, think or speak properly.  The French "propre" even goes further down this derogatory path to mean "clean", with disambiguation by position ("c'est ma propre maison" = "it's my own house" vs "sa maison est propre" = "her house is clean"). Such extensions seem indeed characteristic of a language controlled by some aristocracy. It's worth noticing that the English "own" and its German cognate "Eigen" do not seem to have suffered similar semantic drifts. 
Sticking to the original meaning and forgetting the interpretations of either grammar or aristocracy, common names would be simply names belonging to the commons. Which is true, if you think about it, for just any name. A name with no community (or communality) would be useless, and actually barely a name, just a string with no shared usage and agreed-upon denotation. Under such a definition, even proper nouns are common names. From a grammatical viewpoint, "Roma" is a proper noun, but it's common to all people using it to denote the capital of Italy. To make it short, all names belong to the commons, otherwise they don't name anything at all.
The above analysis does not apply only to natural languages names (aka nouns), but also to all those technical names handled in our information system internal languages, the names used by machines to call each other in the dark (see previous post) and take actions. URIs, addresses, objects and classes names ... if those were not common names, we would have no open Web, and no open source code with reusable libraries.
But those common names, when used and interpreted by software, behave internally at run time as proper names, by all means of "proper". They each call a well defined individual object, method or whatever piece of executable code. A URI sent through the HTTP protocol is eventually calling by their internal names specific pieces of data on one or more servers, all of them running by their own, proper, often proprietary code with its idiosyncratic functional semantics.
Otherwise said, if the declarative semantics of a technical name (description of what it denotes) belongs to the commons, its performative semantics (what it does when called) is proper to the system in which it is used, and conditions at run time.

How is that relevant to the W3C Shapes debate? What this group is (maybe) seeking (or should seek) is actually a (standard) way to describe proper performative semantics for systems using RDF data. On the DC-Architecture list, +Holger Knublauch is complaining a few days ago.
Yet, there used to be a notion of a Semantic Web, in which people were able to publish ontologies together with shared semantics. On this list and also the WG it seems that this has come out of fashion, and everyone seems "obsessed" with the ability to violate the published semantics.
Violate the published semantics? Well, no, it's just about describing how the common semantics behave properly in my system. But whether that can be achieved through yet another declarative language or some interpretation of existing ones without blurring the RDF landscape a bit more, is another story. 

2015-02-17

You need names on the Web, it's dark in there.

The chinese character 名 (name) which we have seen in the previous post as the mother of all things, has an interesting origin. It's composed from the characters 夕 (night, symbolized by a crescent moon) and 口 (an open mouth). The clue of such a mysterious association is that you need a name either to call someone, or to identify yourself, in the dark of night. In daylight, you don't really need to know the name of your interlocutor to recognize each other and engage into conversation. You don't need names of things to find and handle them.

Interaction through information systems, and singularly on the Web, is a conversation in the darkest of nights. You can't see your interlocutors, you can't wave or bow at them, and you don't see either what your are looking for, and the system does not see you. So you need names everywhere. You need names to enter the system, to login, to send messages. You need to know names to connect to people on the social web. You need to know a name of what you search to ask a search engine. One can argue that all of this is rapidly changing, with identification using your finger or eyeprint, connecting to stuff or people using icons and various fancy non-textual interfaces. But under the hood, the system will still exchange ids, keys, adresses, all those avatars of names used by machines. If our online experience gets closer and closer to daylight conversation, poor machines will keep  for a long time shouting names to each other across the dark of Web.

2015-02-07

名可名,非常名

My conversation with good old 老子 is a neverending story, and I had to revisit him with the untranslatables paradigm in mind. I discovered long ago the extreme difficulty of translating the chinese characters and singularly in ancient writings through the excellent introduction I already mentioned here some years ago, this "Idiot chinois" by Kyril Ryjik. This book had sold out long ago, my exemplar was lost in a former life, fortunately a few years ago on some obscure blog I stumbled on a PDF copy I was preciously keeping safe ... but I can now forget about all those. After thirty years of dark ages, L'Idiot Chinois is now republished, and this new edition should land on my bookshelves anytime soon ...
The infamous and cryptic first chapter of the 道德經 would certainly be easily short listed in any challenge of the best untranslatables ever. It is an example Ryjik is presenting, because it's both too well known and too much translated, and certainly deeply misunderstood by most western translators.
Here goes the first part, which even if you don't read Chinese will strike you by the rhythm and sheer graphical refinement of its 24 characters. Note that the character 名 (míng, "name") is repeated five times, a hint at this story being about names and naming, mainly. 

道可道,非常道
名可名,非常名
無名天地之始
有名萬物之母

Ryjik holds that all but a few western translations and interpretations project a transcendental interpretation of  which does not make sense in the historical/political/cultural context where this text was produced. This is still the case of many available translations, for which the Dao has too much the look and feel of our western monotheist God. If nothing else, the initial caps everywhere are suspicious, there is no upper-case in Chinese.  should certainly be taken with a more mundane meaning : the way the world is going, and that human beings should try to follow, individually and collectively, in order to live in harmony with the general flow. Only physics, no metaphysics.
With this in mind, Ryjik posits that the negative  in the first sentence should be certainly read as a determinant of 常 (constant, unchanging, regular, in one word steady), rather of the whole group 常道. 
In other words, where most translators read 非(常道) not (steady way) one should rather read (非常)道 (not steady) way. Which makes the whole sentence read  something like (a) way really way is not a steady way. In other words : if you want to conform your way to the way (of the world at large) you have to adapt and change (as the world does). In the historical context, Ryjik holds that this is a moral and political recommandation not to stick to a rigid application of ancient rules despite the situation is everchanging. But this is a general consideration, just put there to introduce the main point of the story : the role of names.
Reading in the same spirit 名可名,非常名 yields name really name is not a steady name. Since things as the world flows are everchanging, the names you give to things are also bound to change to keep their accuracy. And in this spirit I just changed the title of this blog ...
As for the following two sentences which seem more mysterious, I've not been fully convinced by any translation so far, even the one by Ryjik. I'm pushed towards proposing my own translation by a beautiful edition entitled "La Danse de l'Encre", illustrated by Lassaâd Metoui, a tunisian calligraph. Thomas Golsenne writes in the introduction (in French, my translation)
"To read the Tao Te King against the grain, out of context is not only a right granted to the reader, it's a sort of duty  ... Understanding or translating [it] "faithfully" does not make any sense, because there is nothing to be faithful to, nothing but emptiness"
So be it, here goes my own unfaithful version of the two following sentences

無名天地之始  : there is no name at the origin of the universe
有名萬物之母  : having a name is the mother of all things

Which I read : the world as a whole 天地 (sky and earth) exists before and beyond any name, and does not need any name to exist, but with names come the separation in things, this and not-this, one, two and the ten thousand beings like said further on in chapter 42. 道生一,一生二,二生三,三生萬物. Dao is father of one, one is father of two, two is father of three, three is father of the multitude of beings.
I'm not sure we need another subject than 無名 and 有名 in those two sentences, a subject which would be implicitly 道, as most translations have it, like "Without name the Dao is the origin of the Universe" etc ... here comes the Holy Ghost, the Logos and the heavy monotheist capitalization. But the dao has nothing to do with the Holy Ghost. There is no metaphysics in the dao, only physics. 
This is actually somehow akin to the (too noisy) recent thesis of Markus Gabriel "Warum es die Welt nicht gibt". Things exist insofar as they are named, but the world cannot be named as a separate entity because there is nothing from which it could be separated from.

Amazingly enough, there is no entry for name in the Dictionary of Untranslatables. Not even a small entry in the index. This is certainly food for thought to expand in a future post.