information is not Information

Thoughts during a protracted wait in the lobby at Computers, Freedom, and Privacy 1993.

"information is not Information"

Information is not separated from the data stream adequately under most persons' present paradigm. I submit that information is a unit of selective attention, and that most current data paradigms falsely assign the label "information" to data that has not yet been contextualized and thus does not deserve the label.

In a paradigm where data is information, the strategy is usually to try to gather as much "information" as possible and then rely on artificial means to subjugate and select it. This strategy breaks down rapidly when dealing with new kinds of genuine information, as it relies on pre-existing knowlede in the researcher to construct sifting algorithms and queries and does not natively support a model of incidental or serendipitous information discovery.

Most proponents of "all data is information" attempt to counter this by saying that, practically speaking, there are no "ideal" queries, ie queries which are sufficiently well- formed that incidental information is not generated. I would agree that most query systems do produce additional results, but that those results are usually extraneous rather than incidental.

people inspect queries only cursorily, the default is to search visually by hand/eye and throw away anything that doesn't match a first pass pattern match

mode supports throwing away information

also, people know and expect to get bogus info and most are trying to optimize for only "relevant" info

current info systems do not support a query into "where does this fit in and what paths lead from it" as much as "what is it and what can I find out about it". We are losing information when our query systems fail to consider the relation of information to itself and to the greater information stream. In time these considerations will form the basis of qualitative tools with which to distinguish the relative merits of information, tools which are becoming increasingly necessary in today's explosively increasing data flow. Already the problem is shifting from "where did I put it?" to "how much can I trust it", a question which today's data paradigms increasingly fail to address.

Let me explain what I mean by "today's data paradigms", lest my usage of the term diverge too far from the community norm. In most of the information systems proposals that I have seen, there is computational support for gathering information and for making a certain class of factual judgements about the gathered data, usually by way of sorting or sifting the data in accordance with a set of narrow expectations. An example of what I mean is gathering the set of IEEE technical abstracts and flagging all the articles that contain the phrase "optimizing compiler". You now have a list, or perhaps the entire content of, the articles which contain the phrase.

At this point, you have saved considerable work in that you were not required to go through hundreds of citations or full articles by hand searching for a relevant phrase or keyword. However, considerable work still awaits, in many cases a *comparable* amount of work, in judging the content of the retrieved articles.

Oh hell, I'm getting lost here; what I'm trying to get at is that I still see the same fundamental paradigm, that of "the user decides what is important and goes out and sifts data until he/she gets Ôenough' that matches the preset criteria, the merrily crunches away on said data". Any automation of this process effectively puts up electronic blinders on your vision. A common response is , "oh, but we can simulate that random data discovery, we can program the interface to show you random things that you wouldn't normally be interested in and you can get it that way". Yes, but instead of getting a contextually related set of new inputs (related either geographically or logically, depending on the context) you will get truly random stuff. Work can and will be done on making the "stuff" less "random", but that work is *still* subject to the limitation of the basic paradigm, namely that the stuff is made less random by deciding in advance what's interesting from what we decide is important and, well, you get it. Back to square one.

Obviously when you create any information structure, some elements will be arbitrary. In fact, we can safely say that some information structures are valuable precisely and only because of that arbitrariness. But when our underlying working paradigm is based on pre-evaluating unknown data, we lose a basic aspect of creativity and connection and impose a fundamental paucity on our input data set.

This is unsurprising when taken in the context of human neurological and cognitive structures. In many respects our whole strategy of dealing with the world consists of making preliminary guesses about real-world sensory data and throwing away whatever seems valueless and inappropriate. The context of value changes dynamically in realtime, as anyone knows who has ever been shocked out of a thoughtful reverie by horn of an oncoming car! I consider computers to be the most underutlized technology ever devised for augmenting human creativity and thought processes. Utilizing the full potential of computational adjuncts to human cogitation requires that we routinely step outside the common paradigm and stop reimplementing our limitations in software because it's the only way we know.

[Boy howdy this is sounding inflammatory! And it needs citations....]