Monday, July 21, 2014

Transactional replicated databases using consensus algorithms

A trendy thing nowadays is to build a transactional replicated database without using two phase commit, but using a different type of consensus algorithm.

The gold standard in this area is Google's F1 database, which is built on top of their Spanner infrastructure, which in turn is built on their production-quality implementation of the Paxos consensus algorithm.

Now other, similar systems are starting to emerge, demonstrating that one way to get attention in the world is to hire some Google engineers who worked on-or-near Spanner and/or F1 and/or Paxos, and build something yourself.

  • Man Busts Out of Google, Rebuilds Top-Secret Query Machine
    Under development for the past two years, Impala is a means of instantly analyzing the massive amounts of data stored in Hadoop, and it’s based on a sweeping Google database known as F1. Google only revealed F1 this past May, with a presentation delivered at a conference in Arizona, and it has yet to release a full paper describing the technology. Two years ago, Cloudera hired away one of the main Google engineers behind the project, a database guru named Marcel Kornacker.
  • In Search of an Understandable Consensus Algorithm (Extended Version)
    Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to (multi-)Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems.
  • Introducing Ark: A Consensus Algorithm For TokuMX and MongoDB
    Ark is an implementation of a consensus algorithm (also known as elections) similar to Paxos and Raft that we are working on to handle replica set elections and failovers in TokuMX. It has many similarities to Raft, but also has some big differences.
  • Ark: A Real-World Consensus Implementation
    Ark was designed from first principles, improving on the election algorithm used by TokuMX, to fix deficiencies in MongoDB’s consensus algorithms that can cause data loss. It ultimately has many similar- ities with Raft, but diverges in a few ways, mainly to support other features like chained replication and unacknowledged writes.
  • Out in the Open: Ex-Googlers Building Cloud Software That’s Almost Impossible to Take Down
    But if anyone is up for the challenge of rebuilding Spanner—one of the most impressive systems in the history of computing—it’s the CockroachDB team. Many of them were engineers at Google, though none of them worked on Spanner.
  • Cockroach: A Scalable, Geo-Replicated, Transactional Datastore
    Cockroach is a distributed key/value datastore which supports ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. Cockroach aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. Cockroach nodes are symmetric; a design goal is one binary with minimal configuration and no required auxiliary services.
  • Cockroach
    Single mutations to ranges are mediated via an instance of a distributed consensus algorithm to ensure consistency. We've chosen to use the Raft consensus algorithm. All consensus state is stored in RocksDB.

Just so there's no confusion, let me be clear about one thing:

I am not building a consensus algorithm.

But I enjoy reading about consensus algorithms!

No comments:

Post a Comment