2014-07-21

Vocabularies are finite, hence ambiguous.

Vocabularies have been for thousands of years our main weapons in the fierce war against ambiguity. The Web has enabled the continuation of this war with new weapons called URI and RDF. This new battlefield has seen an unprecedented proliferation of terms, entities and concepts. Although everyone in this space goes on recommending the reuse of existing concepts and terms, not to reinvent the wheel and so on, we all strive for accuracy, and since the existing terms are never exactly fitting either our data or view of the world, we feel forced to add to the pile. We reinvent the wheel because our stuff is just so slightly different.
There is no possible end to this process. To achieve perfect accuracy, get rid of all ambiguity, we would need infinite vocabularies. We all know from high school that actual infinity is impossible to achieve, and this is quite simple to understand. But unbound growth in a very large world, in other words potential infinity, is in practice as difficult to grasp as actual infinity. Both are, to paraphrase Woody Allen, very large near the end, and whatever the ability of the information system to scale (brainware, hardware and software all together), it will break at some point. If you say to someone that the universe is infinite, he's ready to accept it intellectually as a default option, because universe having limits is in fact more difficult to grasp, not so much because of its weird space-time geometry than because its actual size and proportions, finite but so large, discourage all attempt to achieve accurate physical or mental representation.
What do we bring home from that? That the finite nature of our vocabularies, even extended by the impressive growth of technologies, makes that we have to live with ambiguity forever. Hence we have to consider ambiguity not as a bug, but as a feature of our vocabularies. Unfortunately many people still do believe, or act as if they believe, that because they are the domain experts and have worked for years on it, their terms are perfectly accurate and free from ambiguity. Expressing the terms semantics in formal languages is just comforting some of them in this dangerous illusion. And thinking we can achieve non-ambiguity prevents research to focus on the real issue of how to practically deal with ambiguity with the agility and efficiency of natural language conversation.

[Edited 2014-07-22] For a quite entertaining introduction to the issue, see "How many things are there?"