Thursday, January 2, 2014

AGAOIGE

Let me readily admit that AGAOIGE is a very awkward acronym. It does not fall trippingly off the tongue, and is, in fact, completely unpronounceable.

Yet, it appears that As Good As Oracle11g Is Good Enough apparently needs some sort of codification, for it seems to have become an adademically-established assessment of adequate system quality.

I came across this new concept while reading the generally-quite-interesting paper by Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica: Highly Available Transactions: Virtues and Limitations.

In Section 3 of the paper, the authors introduce AGAOIGE:

As shown in Table 2, only three out of 18 databases provided serializability by default, and eight did not provide serializability as an option at all. This is particularly surprising when we consider the widespread deployment of many of these nonserializable databases, like Oracle 11g, which are known to power major businesses and product functionality. Given that these weak transactional models are frequently used, our inability to provide serializability in arbitrary HATs appears non-fatal for practical applications. If application writers and database vendors have already decided that the benefits of weak isolation outweigh potential application inconsistencies, then, in a highly available environment that prohibits serializability, similar decisions may be tenable.

Let's see if I can restate that in plain English:

  • Lots of major businesses use Oracle 11g, and therefore that's good enough.

It might seem that this is just a sort of throwaway observation, but the authors return to it several times later in the paper. In Section 5.3, at the end of the meat of the paper, which is a long and detailed discussion of transaction theory, the authors summarize their work, and then comment upon the summary:

In light of the current practice of deploying weak isolation levels (Section 3), it is perhaps surprising that so many weak isolation levels are achievable as HATs. Indeed, isolation levels such as Read Committed expose and are defined in terms of end-user anomalies that could not arise during serializable execution. However, the prevalence of these models suggests that, in many cases, applications can tolerate these associated anomalies. Given our HAT compliance results, this in turn hints that–despite idiosyncrasies relating to concurrent updates and data recency–highly available database systems can provide sufficiently strong semantics for many applications.

And, still later, they reflect on this observation once more, in the conclusion to the paper:

Despite these limitations, and somewhat surprisingly, many of the default (and sometimes strongest) semantics provided by today’s traditional database systems are achievable as HATs, hinting that distributed databases need not compromise availability, low latency, or scalability in order to serve many existing applications.

In this paper, we have largely focused on previously defined isolation and data consistency models from the database and distributed systems communities. Their previous definitions and, in many cases, widespread adoption hints at their utility to end-users.

About half-a-dozen years ago, I was privileged to hear a high-placed executive from the largest telecommunications company on the planet describe the state of their internal systems. During the talk, he made a statement that can be loosely paraphrased as follows:

  • Based on our internal data, we've estimated that nearly 70% of the orders that enter our order processing systems get lost at some point in the overall workflow, and manual intervention is required to correct the damaged order and restart the processing.
I'd like to say that this is an outlier, but in fact, the state of correct enterprise application software is horrific. As a practical matter, "many existing applications" are complete junk. Have you ever interacted with a call center? Tried to correct personal information that a company holds about you? Attempted to track down the status of a work request? Our existing applications simply don't work, and what's going on is that an army of humans spend all of their waking hours dealing with and resolving the mess that these applications leave behind.

I guess I shouldn't get so upset about this, but the authors of this paper include several Professors of Computer Science at what is arguably the most prestigious academic computer science institution in the world, and the paper itself is scheduled to appear at one of the oldest and most prestigious conferences in the computing world.

So, from the perspective of one of the poor users of the existing applications that litter the planet, may I politely suggest that AGAOIGE is not a sufficient standard for software quality, and that, when designing the systems of the future, we should do everything possible to hold ourselves to a more rigorous measurement?

Thanks.

No comments:

Post a Comment