I've eventually given up trying representing hubjects at all, at least for the moment. I had a serious try at it at lingvoj.org. But after discussions in the Linking Open Data forum, I eventually surrendered and published the languages description in a way conformant to W3C recommandations for Semantic Web architecture, with content negociation, 303 redirects and the like. I've even suppressed the previous post here saying otherwise, which would be now full of dead links and would bring about confusion.
So we'll see how this flies. Feed your favourite tool with the URI http://www.lingvoj.org/lang/zh, and figure by yourself if it provides a useful description of the Chinese language, both for humans and machines.
Yes, I find the data valuable, particularly insofar as they reach beyond those in the Unicode CLDR. However, for my purposes the data are suboptimal, because they don't adhere to the orthographies of all languages with respect to letter case. What I am looking for is lexemes in their most common citation (dictionary) forms.
ReplyDeleteOne more thing: I believe the translations coded as Norwegian ought to be coded as Bokmål. I'm using them as terms in "nob".
ReplyDelete2 more suggestions:
ReplyDelete1. I suggest omitting "n/a". It is formatted as if it were the name of a language. Many other missing names are not marked. Is there a particular reason for marking some of them?
2. I suggest correcting the ASCII apostrophes. In standard orthographies they would be various other characters.
Jonathan, thanks for the feedback. I have not much bandwidth for lingvoj.org those days, but note your suggestions.
ReplyDeleteSome points
1. Labels are currently automaticaly extracted from Wikipedia. Quality may vary ... THere are other more reliable sources like the Unicode website, but I had not the time to exploit them.
2. "n/a" are clearly bugs, they should not be there ...
3. Various forms of Norwegian is a nightmare, even for Norwegians it seems. Would need a particular attention indeed.
4. ASCII apostrophes should be indeed corrected.