2013-02-04

From 'Long Data' to 'Long Meaning'

I was attracted and actually misled this morning by the title of this article published last week in Wired. I put comments both on the original article and in a Google+ post. But the question deserves certainly more than those quick reactions. Long is definitely climbing the hype curve, and it's a good thing if the concept meaning does not get blurred along the buzz process, a common pitfall accurately pointed by several comments on the  said Wired article.
The Long Now Foundation has coined the concept for quite a while now, and two recent Long Now Blog entries are indeed about Long Data. But Long Data is not only about gathering data from the past in consistent series despite all difficulties, but also (to paraphrase the tagline of next Dublin Core Conference) linking to the future, which means having the data of today still available 10,000 years from now.
Putting everything in the perspective of a 10,000 years time span is quite arbitrary, but it's large enough to figure that after such a time most things we take for granted today are likely to have disappeared. Our individual lifes of course, and our children and grandchildren over hundreds of generations, and certainly most institutions we know of, companies, international organizations, libraries, countries and towns, etc. And, certainly quicker than all of the above, the technologies of today, and faster than any other kind of technology, the information technologies, if we judge by their current rate of obsolescence. Given such obvious facts, how could data stored today be still available in such a remote future? Available means not only physically stored somewhere, but in some readable format, and in a language still understandable. Looking at the past undeciphered writing systems, the Long Now time frame is quite relevant, since most of them are "only" some thousands years old. 
Which sensible conclusions can we draw of all that regarding long-time availability of data? It has to do of course with the preservation of specific physical supports, of specific formats, specific representations and languages, but what is needed over all is a preservation of the meaning across those everchanging supports and systems, as illustrated below.

The above fragment, according to this source, quoted by Wikipedia, is one of the oldest surviving fragments of Euclid's Elements. Although the availability of such a document after twenty centuries or so is great, I for one despite my maths background, would be totally at lost to say which data have been encoded here (read, what Euclid meant by that piece of writing). People who know best tell me this very content is known as Proposition 5 of Book II, translated in not-so-modern english below in the above quoted source, where the curious reader will find also the Greek version.
If a straight line be cut into equal and unequal segments, the rectangle contained by the unequal segments of the whole together with the square on the straight line between the points of section is equal to the square on the half.
Those who are at lost with this quite old-fashioned geometrical parlance will, maybe, prefer the algebraic translation of the same property.

ab + (a-b)2/4 = (a+b)2/4

What is great here is not only to have saved the original document, but to be able to still decipher it, and moreover to have translated over centuries its very content (the original data stored by Euclid) in so many variants and languages (Greek, English, Algebra ...) and passed it through historical books, elementary geometry books, using all sorts of supports and representations. Such a diversity, actually, ensures that whatever the future evolution of information systems, physical supports, formats and languages, the semantics of the original data is very unlikely to be lost.
As pointed already some years ago in a post about URI species, biological information is sustainable over time spans even much longer than the Long Now perspective (we speak here of hundreds of milllions if not billions of years for some of it), although relying on quite short-lived and fragile storage units such as you and me, duplication and diversity of support being the keys.