Recently Microsoft has recognised some important information management concepts with their work in SQL Server Data Services (SSDS), LINQ and SharePoint. Each tackles different aspects of an important problem, but none ultimately confronts it directly. It is not my intention to criticise those efforts, because their objectives may not have included confronting that problem directly. Indeed the problem is so familiar that its existence is usually tolerated without question. This discussion is about that problem, how Microsoft has handled certain aspects of it and how it should be tackled to provide a beneficial sea change in creating and maintaining software systems. This discussion acknowledges the existence of two important distinct logical domains, which are almost universally present in complex systems. However, their existence and distinct nature are not mandatory, but where present they are at the root of the problem.
The problem under discussion is ‘the difficulty in creation and modification of a software system with an inherent bifurcation of the techniques, technologies and implementations used to provide persistence services and those used to provide computational services’. Or to put it another way ‘using databases can be tricky’. The existence and treatment of a persistence services domain as distinct from a computational services domain in a software system is at the root of this problem.
There are many points of tension between the two domains, as can be seen in the complexities of attempting to answer this question: ‘In which domain should a section of computational behaviour run, that modifies in-memory cached and persisted data records, when triggered by an event that may occur at any point in time?’. There is also impedance to information and computational behaviour travelling between the persistence services domain (currently dominated by implementations based on relational algebra) and the computational services domain (currently dominated by implementations in the imperative and object oriented programming paradigms). I am suggesting that the tension and constriction arising between the two domains adds substantially to the complexity and hence cost of producing software systems. That in turn reduces the sophistication of those systems, because there is always a finite budget and ceiling to complexity that can be managed. If my assertion about the significance of the problem is true, then reducing it is a key strategy to enabling better software systems in the future. To highlight the magnitude of the problem let us look briefly at three important facets of it.
Firstly and most importantly is the need to link a persistence services centric view to a computational services centric view. This involves mapping of data types and logical locations, coordinating data duplication and event handling, dividing behavioural definitions between domains, creating transactional control wrappers, providing access restriction and sequencing, choosing a load distribution scheme, creating parallel processing patterns and there are of course many other possible issues. If one looks at this list one can see the potential for a rich mixture of implementation in each domain. This inevitably complicates to resultant system, especially where the two domains need to interact.
Secondly, there is an ossifying effect on the system because of the complexity inherent in dividing it between two domains. A change to either domain needs careful work so as not to cause a cascade of problems throughout a system, but this is especially true across domains. This effect tends to complicate and discourage change in either domain.
Thirdly, the diversity of DBMS creates problems. Although notionally many RDBMS implement the same SQL standards, there is nonetheless enough latitude and feature discord to require substantial extra implementation effort when changing RDBMS. Where a system is fortuitous enough to use only one type of DBMS the work is minimised, even so, time will inevitably throw up the need for a DBMS change. That variety causes a change in implementation skill sets required, which inevitably adds cost through training. Even if a team is able to temporarily employ varied skill sets to keep its training costs low, the industry as a whole is disadvantaged by the costs of adding variation to no or little advantage.
Very briefly then, the solution needs: [0] commoditisation and convergence of persistence services [1] uniform library and/or language persistence control across programming languages [2] no distinction of a system into two domains, or minimising the system distribution between the domains and the impediment between them.
My own work on information management concepts spans this problem and that is what informs this essay.
SharePoint impinges on this problem by providing a development framework with a default implementation that is sufficiently rich in features that all systems needing a single logical repository and associated persistence services have a good starting point. Unfortunately this does not tackle most of the features of the problem. It does not for example fundamentally change the dual domain nature of the problem; and so does not change the concomitant system petrifaction issue. Nor does it have anything to say about commoditisation and convergence. It does however make advances in providing libraries of computational behaviour that can be rolled out to multiple languages via the .Net. CLR. Basically it is a bag of computational services in a library bundled with a RDBMS to simplify its use. So although it may be a good effort using the latest development tools, languages and runtime environment it does not break new ground for this problem.
LINQ is essentially embedding a new generic query language into programming languages. As such LINQ is an alternative for SQL that can target multiple data sources. This is laudable as a pragmatic solution to problems inherent in the status quo, which are rooted in the diversity of information repositories. Unfortunately it does not confront the dual domain root of the problem and the attendant calcification of the system. Nor does it treat the issues around commoditisation and convergence of the repositories. Rather it acknowledges that diversity and simply tries to manage it. Beyond what it does not tackle, it is also rendered clumsy relative to the library approach of SharePoint style solutions, because its form will not fit elegantly into all programming languages. Any similar solution that does adapt its form to the target language has lost the uniformity advantage it offers. Its main advantage lies outside the scope of this problem in providing a consistent query mechanism to multiple target repositories. A well explored and considered realisation of certain concepts. However, in the context of this problem it is quite tangential.
SSDS uses LINQ to access SQL Server, but the services aspect of the name is the interesting thing in SSDS. The ability to not have to manage server hardware or persistence services software is the bonus. Naturally this comes at a price: security, performance and reliability all have the potential to be worse. The question is will they be adequate. For some uses and users they will. So no doubt SSDS will take off to an extent. Indeed as I discuss elsewhere on this site, SSDS is in the vanguard of a nascent industry that will one day transmute into the most powerful of all industries. A single remotely managed logical DBMS instance accessed through a language embedded feature from a CLR is compelling. SSDS does provide commoditisation of persistence services, but not convergence of DBMS. That will take a great deal more time to achieve. SSDS also does not affect the dual domain aspect of the problem. Consequently there is also still the irritation of the inertia of the solution with respect to change. However, SSDS is an inchoate entry into a very valuable persistence services industry and is a stepping stone to the holy grail of computing services that is pervasive information services. So SSDS regardless of its importance only really tackles the least significant of the main issues of the problem.
Other recent Microsoft innovations worthy of note here are Silverlight and the .Net CLR. Although they do not tackle the problem we are discussing here, they do loosely exhibit the convergence pattern I am trying to endorse i.e. Silverlight adds platform diversity in the shape of browsers acting as virtual machines to host to the .Net runtime environment, which in itself is a common runtime environment for multiple programming languages. We can see that the CLR concept embodied in .Net is a natural successor to the VM concept used by Java and should be sufficiently powerful without Silverlight, which is essentially a CLR embedded inside another VM, which the browser has evolved into. This is an expediency to get around that lack of deployment of a single CLR to all platforms. Such a well designed and pervasive CLR, which .Net has the potential to become, can then be accessed by browsers simplifying them again to their original intended document presentation tasks from what they have become, a diverse set of virtual machines that like RDBMS do not adhere closely enough to a well defined standard.