2006-12-29
Classifying is hard, tagging is worse
Sounds familiar? I already hear the folksonomy people crying : "Hey, of course, that's why tagging is so cool". As far as I am concerned, tagging is worse, it means more arbitrary decisions, because not only do I have to choose a category, I can choose more that one, or none at all, and I have to figure them myself. Way too many decisions ... That's why my browser bookmarks and email folders are a mess, why I have no del.icio.us account, why my Technorati profile is so low, etc ...
Beyond my own decision difficulties, there is something to be added as this now long discussion obout ontologies vs tagging. What I've learnt in science is that a good theory is a falsifiable one. What you assert using an ontology, whatever language or framework with declared formal semantics, is falsifiable. No formal semantics, no notion of true and false, hence no falsifiability. In other words, and to make it simple, an RDF assertion can be declared or inferred true or false vs a given ontology, a OWL class can be proven unsatisfiable etc. Nothing of the like with tags. Assignation of a tag cannot be proven true or false, or inconsistent. Tags are not falsifiable.
By the way, the same distinction is to be made for RDF vs Topic Maps. Topic Maps are not falsifiable, because they have no formal semantics. Now the question is to know is falsifiability, which has been proven to be critical in science, is also critical in information technologies.
That said, since the new Blogger version enables easy tagging (maybe the older version did also, but was never aware of it), and since there is now quite a bunch of posts on univers immedia, I decided to be brave and start tagging them, as thoughtlessly as possible. Starting by the more recent ones, I then shifted to the most ancient, a good occasion to revisit them if nothing else. The result you see on the left under "What". First impression is of course there are too many of them, but I will try to keep up that way throughout the blog just to see how it flies, then maybe keep only the most frequent ones if I end up with a too long list.
2006-12-27
OWL ontology for identity on the web
The definitions of resource that can be found in literature show ambiguity, making the issue of handling the identification of a web resource very problematic.
Our approach restricts the nature of the web resource to that of a computational object. This choice is motivated by the fact that a resource is something that has to be addressable, and things like cars and people are not addressable for their nature. Hence, it is wrong in principle to use the same mechanism of addressing for entities that have such different sorts.
Migration to new Blogger version
The list of contributors does not show anymore, they have to do something about their Blogger account to be able to post again.
A couple of things I've been about lately
I've been silent here for over two months now, my blogging time devoted to the Mondeca blog in French Leçons de Choses. But there is a couple of things I've been working on, worth mentioning.
I've exchanged with Michel Biezunski on his Data Projection Model , and found out that its genericity and simplicity made it easy and straightforward to express the structure of Mondeca ITM, without the borderline hacking needed when using either OWL-RDF or XTM for the same task. Now open questions: What will happen with that model? Who will see the benefits over languages already in this space, and singularly over RDF? Who will build tools supporting it?
Been wondering if a semiotic approach could shed some light on our thoughts on referents, and came out with a RDF semiotic triangle. The URI is the signifier, the RDF description is the formalisation of the signified concept associated with the URI. The referent is out of the language and signs realm, and should stay there. In this approach, attempting to achieve a representation of the referent, even using tricks as blank resources or hubjects of any kind, is therefore a recursive trap and actually a non-sense. So any declaration of same-ness or identity of referents should be avoided. Only concepts bear identity, not their referents. From that point on, came to the idea that linking different concepts/signs (URI + RDF description) which humans consider to have more or less similar referent will take the form of processing rules, more than declarative semantics.
Thanks to Jakob Voss for this post in a long thread on public-esw-thes list, which really triggered a kind of illumination about this. As an example, trying to say that my SKOS concept a:Restaurant has the same referent as your OWL class b:Restaurant through any RDF declarative relation between those two resources shoud be avoided. But I can set in my system a functional rule expressing that any document of which subject is an instance of your b:Restaurant class will be indexed against my a:Restaurant concept. The referent is represented nowhere, but it is acting at the core of this rule.
Actually we have this very indexing rule mechanism working in some Mondeca applications, and I have submitted a paper to XTech 2007 about it. More to come if ever the paper is selected.
Lately, got interested again in triggering some process to have languages available not only as tags to use in XML, but as proper RDF resources. This is an old story tracking back to OASIS Published Subjects Technical Committees, and singularly PSI for languages. Track this topic on ESW Wiki, and see here for ongoing thread and more explanations. There again, my proposal is to forget absolute identification of a language by a URI. Concepts identified by URI are the properties and property values than can be declared for a language, and let applications decide on which properties are useful to them. No absolute rule saying that two descriptions refer to the same language.
2006-10-27
Leçons de Choses
2006-10-14
Geonames enters the Semantic Web
2006-10-03
Back to Earth
Places are the most fascinating use cases when it comes to identification and description, and all those technologies and their ability to interface with each other tend to prove that information about some place can be aggregated without any consensus or explicit declaration of what this place (or a place, in general) actually is, but by providing hubs between data, and interfaces putting together data somehow relevant to this place (such as Google Earth, Google Maps API and the like). Geonames web services provide a good example of this.
2006-08-07
In defence of 404
2006-07-26
More thoughts on that Blue Glass
[2016-06-20] John Black's Kashori archives have moved. The reference articles are now here:
http://www.kashori.com/archives/2006_06_11_archive.html
http://www.kashori.com/archives/2006_07_02_archive.html
2006-07-25
Ambiguity, Ostention and Description
There is only one point on which I would argue. Pat holds that reference can be made by ostention (gesticulation showing what you are about) or description. All Pat writes thereafter about description being inherently ambiguous, I strongly agree with : disambiguation being a contextual process, the more precise the description, the more ambiguity you get, and so on.
But I would hold that ostention is as ambiguous as description, so that reference is ambiguous in nature whatever the way it's done.
Suppose I am holding a book and ask you : "Have you read this?". The reference to "this" is by ostention, since I seem to hold and show "this". But the "ostentatum" indicated by "this" is actually some copy of some edition of some book. Does "this" refer to this specific copy, which happens to be my own personal copy (maybe annotated in some way), or is the referent the particular edition of which this specific copy is a sample, or is it the abstract entity, the book independent of any physical support, of which what I am currently holding happens to be some physical avatar? Every one of those interpretations is meaningful, and only the context of the conversation might disambiguate. So even with ostention, there is ambiguity left.
2006-06-20
Wikipedia's semantic cow paths
2006-04-11
More use cases for nondescript resources
2006-04-04
Identifying things - blank nodes again
So my suggestion again is here to use blank tagging, that is, allow users, in a simple way, to make all those resources point to the same blank node.
Now something is slowly coming from the back of my mind. I thought for a while we needed a specific and mysterious vocabulary to do that, hubjects and the like. Since this kind of stuff is far from being on the track of adoption, maybe using more popular and less exotic vocabulary, such as dc:subject or something similar would make the whole thing more understandable. Seems there is no formal opposition to declare things like:
http://www.amazon.com/gp/product/B00006RCLH dc:subject _:b
http://labs.oclc.org/xisbn/068981836X dc:subject _:b
http://en.wikipedia.org/wiki/Call_of_the_Wild dc:subject _:b
And actually, any other property could be used as well, such as the following, to take the example from Jon Udell's post
http://upcoming.org/venue/3669/ a:venue _:x
http://eventful.com/venues/V0-001-000150985-3 a:venue _:x
This kind of declaration keeps completely agnostic on what a venue in general, and this particular one actually is. It simply says that the two resources are about the same one.
2006-03-15
Identity vs Meaning
The identity is singular. The meaning is relative.
a:SomeRegion a:partOf a:SomeCountry
a:SomeRegion skos:broader a:Some Country
2006-02-10
Identity -- some philosophical musings
IDENTITY
Crystals appear (on the scene of Reality) -- just like organisms -- always as individuals. Such an individual has a definite Identity that remains constant during its existence. It is, say, A, it is not B, not C, etc. A developing crystal of Salt (growing in a solution) can change its shape while its Identity remains the same. For organisms this applies even stronger. We ourselves (being an organism) seem to have direct experience of our Identity staying the same during all of our life in spite of the fact of the many changes we constantly undergo. Some insects undergo a strong metamorphosis (for example from caterpillar to butterfly) but nevertheless their Identity stays the same. So it seems for every entity, which is an intrinsic whole, that there is something that remains the same, and something else not remaining the same, but always changing. In Philosophy such changes are called "accidental" or "per accidens" in relation to the persistent Identity. This Identity is called the "intrinsic Essence" of the thing, so every real uniform being has such an Essence.
IDENTITY AS A PRINCIPLE
But what then is this Essence?
Where does it abide?
Does it abide outside the thing (as Plato assumed), or inside the thing (as his famous pupil Aristotle assumed)?
And if the Essence is located inside the thing (meaning that the Essence of every being abides in "our world", and not in some external immaterial world transcending the material world), which I consider the most probable position, where in the thing is it located and in what way? Could this Essence be a concrete part of the thing, the "heart" or "soul" of the thing, which implies that the Essence itself would also be a thing (and this thing should of course also have an Essence of its own........Oh my god, where are we going???), or is it in the thing in an abstract way (whatever that means), like a principle?
A background sketch: together with Joshua Levy, I am building a subject mapped social bookmarking application. We call it Tagomizer (tm). It's being fun. But, it's also causing (moi) brain pain. What is a subject? Let me translate. Someone bookmarks a webpage. This means that the URL of that page, and the page title, are sent to Tagomizer, which then paints a form in which the user can add tags (words or phrases for now, images and other objects later), and a body of text taken as a comment. A user can come in later and add more comments or more tags, or remove tags. Tags are a large part of Web 2.0, where folksonomies are breaking out everywhere.
What is a subject? When Tagomizer creates a bookmark, it creates several subject proxies in the subject map where those objects don't already exist. Tagomizer is a kind of TMA (topic maps application -- or SMA in the newspeak of the TMRM), so it is responsible for identification of its subjects, some of which might already have subject identity granted by other TMAs. What, then, is a subject? Consider the webpage itself. Tagomizer asks the core TMA to create a subject proxy for a webpage with a given URL. If that subject proxy already exists, it is returned. Otherwise, a new one is created and granted subject identity by way of a PSI associated with the core TMA. Tagomizer, as a different TMA then grants that subject proxy subject identity with a different PSI, one that says "this is a subject identified by Tagomizer." Other TMAs might grant an SIP (psi) of their own. This is necessary because each individual TMA will be adding other properties to the proxy, mostly assertions.
So, a webpage has granted to it subject identity. What is the subject? In this case, subject identity has been granted to a particular resource, a webpage. Nothing more than that. The resource exists, it is located on the web at a particular URL, and it has been granted subject identity based on that URL by one or more TMAs. Each TMA is going to confer other properties on that subject. We know from nothing about the subject itself other than those properties of location and object type. What is contained/presented at that webpage will be the subject(s) of other subject proxies, for which that resource becomes an instance of an occurrence.
Brain pain, for me (warning: admission of ignorance forthcoming), stems from notions of essence. Essence is mentioned in the quote above as an intrinsic issue. Now, we're deep into the same issues that come up from time to time in the OODB community, intrinsic vs. extrinsic properties. There's an interesting thread on web resource identity, not dissimilar to Bernard's previous post on URI ambiguity. That xml-dev thread starts here.
Intrinsice-extrinsic properties are discussed here.
Closure? Is closure possible? I post this because I am interested in looking for concensus reality related to interoperable ways in which subject identity can/should be conferred on the subjects of future topic/subject maps. My sense is that the inquiry I reveal in this post represents the, um, essense of this entire blog and of Bernard's inquiry. I'll take my answers anywhere I can find them.
2006-01-27
Pat Hayes on URI ambiguity
Now, what I pushed lately here and there is that different URIs can have the same referent, but describe it differently from different perspectives. Hubjects are formalizations of referents as binding blank nodes. Those two points seem dual and complementary.
Where I differ with Pat is when he says that context, which indeed provides disambiguation of the referent, and is implicitly defined by rules of a given community, a given protocol, should stay this way. In other words, it can't, or should not, be formalized as an object, jusque like in ordinary conversation : the context is effective without being explicit. I think context (or perspective) could and should be declared formally ...