2006-12-29

Classifying is hard, tagging is worse

My work for Mondeca has been for years to help building classification schemes, ontologies and the like for a variety of customers. Most of the time this means formalization of implicit ontologies they already have in their data. And I don't have either to make any decision about actually populating the schemes, this task is left to human editors or automatic text mining engines. I sometimes take care of automatic migration of legacy content, but following rules decided with the customer. I'm very happy with all that, because I'm not good at classification. I tend to see so many subjects in anything, any interesting resource to classify seems so multi-dimensional that choosing a category always brings me to the fringe of undecision, and any decision I eventually make about it seems always arbitrary. Comes maybe from an ancient traumatic experience as Open Directory editor.

Sounds familiar? I already hear the folksonomy people crying : "Hey, of course, that's why tagging is so cool". As far as I am concerned, tagging is worse, it means more arbitrary decisions, because not only do I have to choose a category, I can choose more that one, or none at all, and I have to figure them myself. Way too many decisions ... That's why my browser bookmarks and email folders are a mess, why I have no del.icio.us account, why my Technorati profile is so low, etc ...

Beyond my own decision difficulties, there is something to be added as this now long discussion obout ontologies vs tagging. What I've learnt in science is that a good theory is a falsifiable one. What you assert using an ontology, whatever language or framework with declared formal semantics, is falsifiable. No formal semantics, no notion of true and false, hence no falsifiability. In other words, and to make it simple, an RDF assertion can be declared or inferred true or false vs a given ontology, a OWL class can be proven unsatisfiable etc. Nothing of the like with tags. Assignation of a tag cannot be proven true or false, or inconsistent. Tags are not falsifiable.
By the way, the same distinction is to be made for RDF vs Topic Maps. Topic Maps are not falsifiable, because they have no formal semantics. Now the question is to know is falsifiability, which has been proven to be critical in science, is also critical in information technologies.

That said, since the new Blogger version enables easy tagging (maybe the older version did also, but was never aware of it), and since there is now quite a bunch of posts on univers immedia, I decided to be brave and start tagging them, as thoughtlessly as possible. Starting by the more recent ones, I then shifted to the most ancient, a good occasion to revisit them if nothing else. The result you see on the left under "What". First impression is of course there are too many of them, but I will try to keep up that way throughout the blog just to see how it flies, then maybe keep only the most frequent ones if I end up with a too long list.