Gary E Byatt

Information Storage Strategies

A discussion on information storage strategies.

This document is work in progress.

To have a fully informed discussion we first need to examine and understand the nature of information, then how that understanding can benefit us before discussing storage strategies. Hence the discussion is divided into three top-level sections.

Understanding The Nature of Information

All information has two mandatory and three optional components. To streamline the mechanics of communication they are organised into three levels.

Information Level 1

Meta Information

Optional component. Information about information level 2. Administrative information. For example, who created the information and when it is valid until. If present, usually stored at the same time as the information text.

Information Level 2

Presentation Information

Optional component. Information on how information level 3 will be rendered. For example, what fonts are used, how widely the paragraphs are separated and what tools should be used to render the information accessible to the consumer. If present, usually stored at the same time as the information text.

Semantic Information

Optional component. The meaning of information level 3. For example, a specific information text element represents the birth date of a person. If present, usually stored at the same time as the information text. May be (dynamically) derived from the information text.

From the point of view of a person consuming the information, this is the information, it is the understanding being communicated. This should be axiomatic and deduced by the person consuming the information from the information text. However, explicit semantic description can helpful for example to disambiguate.

Information Level 3
Information Text

Mandatory component. The information text comprises one or more distinct elements to communicate. For example: its distinct elements may be characters, words, sentences or paragraphs.

Structural Information

Mandatory component. Structural information comprises relationships between information text elements. Stored at the same time as the information text. Each relationship has one of two types:

Containment

Sometimes referred to as hierarchy. The containment relationship identifies container and contained information text elements. The container provides context for the contained. The context is a scope or range within which the contained are valid, meaningful or useful.

Succession

Sometimes referred to as serialisation or sequence. The succession relationship identifies predecessor and successor information text elements. Elements are utilised individually and temporally with the predecessor utilised before the successor.

Benefits From Understanding The Nature of Information

Separation of the five components of information drives three important advantages.

Firstly, as the information text is divested of the other components this simplifies any processing of it. For example: indexing, semantic interpretation and building relationships between information text elements.

Secondly, the structure of the information text may be changed without changes to the information text.

Thirdly, as changes to presentation, meta information or semantic meaning are needed they can be applied independently and even be version controlled so that for example a choice of old and new presentation styles may be applied to the same structured information text.

Information Storage Management Strategies

For this discussion, with regard to a system, I will define two broad categories of information storage management strategy. Multiple information storage management (MISM) and single information storage management (SISM). A system using a MISM strategy has multiple independent instances of information storage management, where a system using a SISM strategy has only one information storage management instance. A SISM is not necessarily synonymous with a single storage location. A SISM may distribute, replicate, cache and move information, but its primary purpose and differentiator from a MISM, is to act as a single point of access and management of information.

Strategy Comparison

Currently there are many DBMS that act partly as SISM software, but although they implement a standard SQL language definition, they each have deviations, extensions an variations. Further, some of the important features of a SISM are not standardised in the SQL language, such as backup, recovery and access control policy. Importantly, DBMS are typically sensitive to nature of information, requiring prior knowledge of a definition of it. For these reasons a true SISM is typically software that uses an underlying DBMS as a part of its implementation.

Strategy Progression

Currently information is commonly stored directly by applications in OS file-system managed files. File-systems may be configured to use a SISM strategy for some files via shared sections of file-system, but the file-system by default assumes independence and so as part of a network of computers a MISM strategy natural arises. The progression from MISM to SISM is frustrated because of two problems. Firstly, computers are sold with the primary expectation of use as standalone devices rather than as a part of a network of computers with a SISM. Secondly, the lack of a standardised SISM interface. The availability of simple to use, low cost, robust devices hosting pre-installed standardised interface SISM software will accelerate the change in balance of expectations. Once the default expectation shifts to computers being a part of a network with SISM software, applications will respond by favouring SISM over the inherently MISM file based information storage.

Definitions of terms used