2012-04-06

LOV stories, Part 3: Vocabularies as Heritage

In previous posts we introduced the Vocabulary Commons, following some of its Gardeners. Let's now  try to imagine how they can turn into a sustainable and resilient ecosystem. In the current state of affairs they look more like a young forest pionneering a new land. A lot of opportunist species have invaded the landscape, some are conspicuous and seem here to stay, some look like they have no future, others have set up in small niches, all kind of interactions and dependencies have emerged.
Should we let the invisible hand of natural selection operate, and let the fittest survive? Do we want those commons to become a wild messy jungle, or rather a pleasant, useful and sustainable garden that we, our children and their children will enjoy? It's certainly how to achieve the latter the DCMI Vocabulary Management Community group has in mind. This group will hold its kick-off meeting on the first day of the London Seminar : Five Years On at the end of this month.
The agenda is in the making, but the following points could be put on the table ...
To survey the landscape, we need an efficient Observatory
Like in nature conservation or landscape restoration, the first sensible thing to do is to figure out the state of affairs. That's what we have started to do with the LOV dataset and related tools, which turns out with time into an observatory of the Vocabulary Commons rather than a mere catalogue. We have made explicit some aspects previousy mostly ignored in existing vocabulary metadata. The first of those aspects we have been working on since one year now is the state of vocabulary interlinkingDifferent types of interlinking are found at element level, we made them explicit and qualified at vocabulary level, using a new metadata vocabulary (VOAF). This has to be consolidated and certainly enriched by new types of relationships we have not fully explored yet, such as genealogic relationships (inspired by, followed by...). 
Acquisition of such metadata is not straightforward. Some of it can be automatically extracted from the vocabulary source code, some is human-readable in the vocabulary documentation (if any), and the rest can be obtained through human interaction with the current vocabulary curator, if any, and when she can be contacted, which is often not the case unfortunately.
Hence a next important step should be to find ways to make explicit the vocabulary current status (maintained, forgotten, stable ...) and what can be expected about its curation over time. Who cares today? Will anybody care next month, and twenty years from now?
To both gather such information and build up more global attention on those issues, a systematic survey could be directed towards all identified vocabulary publishers, curators, creators, contributors and users. Backed by a trustable institutions coalition, DCMI leading the way, this survey would at the same time push good practices, make explicit which vocabularies are alive or dead, curated or not, evolving or stable, recommended as serious by their authors or only toys, having versioning policy or not etc. A whole new set of metadata is actually needed to formally capture such information, focused on social process and vocabulary life cycle.
It's clear that the technical and human resources needed to achieve this are well beyond what have been engaged so far in LOV, which was supported mainly by the Datalift project. A first step towards gathering more resources is the current proposal of migrating LOV under the Open Knowledge Foundation umbrella. And we hope the interest expressed by many stakeholders will translate into actual and durable support enabling a sustainable business model for the observatory.
To foster responsible citizenship, we need a Charter
We will suggest also at DCMI meeting the constitution of a Vocabulary Commons Charter, which would be written and endorsed by major publishers, standard bodies and vocabularies specialists, and would allow vocabulary publishers to define explicitly the level of commitment to sustainability they want to engage in. The level  of commitment to the Charter would be attached to the vocabulary in a way similar to a Creative Commons licence. Levels of commitment would include for example to provide accurate, complete and up-to-date metadata, to re-use as far as possible other vocabularies of the Commons, to have a responsible curator able to interact with the community of users, to make the vocabulary evolve with the ecosystem, to define a long-term life cycle including versioning policy etc.
To ensure long term preservation, we need ... Libraries
Speaking about sustainability leads to speaking about heritage and its long-term preservation. How do publishers, either individuals or organizations, ensure perennity of a vocabulary after they retire or pass away? Preservation of knowledge assets heritage is something librarians have done for centuries, extending the scope from books to all kinds of media and digital assets. The Library community has done a great job lately to move forward towards integration of their legacy into Linked Data, and translation of traditional metadata formats into interoperable Semantic Web vocabularies. But in this evolution, vocabularies have still been mainly regarded as they have always been by librarians : tools to classify, index, search, access and generally qualify and organize other assets. But vocabularies themselves should now be considered as assets. And libraries should be prepared to manage and preserve collections of vocabularies. This is definitely an exciting challenge.