Eric Brewer’s CAP theorem is one of the foundations behind the design and architecture of many of the large scale systems architectures. According to the one form of the CAP theorem, “You either choose availability or consistency. You cannot choose both.”
Is this our only choice?
In this post I’ll try to provide an aggregated view of the different perspectives on this question and plant the seeds for an alternative approach.
Availability vs Partition tolerance.
One of the thing that comes up repetitively through the various debates is the lack of clarity behind the definition between Availability and Partition Tolerance.
To set the stage I’ll start by referring to the original definitions by Gilbert and Lynch . I’ll use Coda Hale’s You Can’t Sacrifice Partition Tolerance – a “must-read” that provides a good summary.
Brief history (By Coda Hale):
In 2000, Dr. Eric Brewer gave a keynote at the Proceedings of the Annual ACM Symposium on Principles of Distributed Computing1 in which he laid out his famous CAP Theorem: a shared-data system can have at most two of the three following properties:Consistency,Availability, and tolerance to network Partitions. In 2002, Gilbert and Lynch2converted “Brewer’s conjecture” into a more formal definition with an informal proof.
The original definition by Gilbert and Lynch:
For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response. That is, any algorithm used by the service must eventually terminate … [When] qualified by the need for partition tolerance, this can be seen as a strong definition of availability: even when severe network failures occur, every request must terminate.
In order to model partition tolerance, the network will be allowed to lose arbitrarily many messages sent from one node to another. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost. (And any pattern of message loss can be modeled as a temporary partition separating the communicating nodes at the exact instant the message is lost.)
Dr. Stonebraker’s “Consistency over Partition Tolerance”
Dr. Michael Stonebraker’s post Errors in Database Systems, Eventual Consistency, and the CAP Theorem argues that since partition failures are rare you might sacrifice partition tolerance for consistency and availability:
Obviously, one should write software that can deal with load spikes without failing; for example, by shedding load or operating in a degraded mode. Also, good monitoring software will help identify such problems early, since the real solution is to add more capacity. Lastly, self-reconfiguring software that can absorb additional resources quickly is obviously a good idea.
In summary, one should not throw out the [Consistency aspect] so quickly, since there are real error scenarios where CAP does not apply and it seems like a bad tradeoff in many of the other situations.
Henry Robinson (Cloudera) describes “CAP Confusion”
Henry Robinson, in his response to Dr. Stonebraker entitled Problems with ‘partition tolerance’, argues that failure are inevitable and therefore you can’t give away partition tolerance:
..Partition tolerance is not something we have a choice about designing into our systems. If you have a partition in your network, you lose either consistency (because you allow updates to both sides of the partition) or you lose availability (because you detect the error and shutdown the system until the error condition is resolved). Partition tolerance means simply developing a coping strategy by choosing which of the other system properties to drop. This is the real lesson of the CAP theorem – if you have a network that may drop messages, then you cannot have both availability and consistency, you must choose one..
Coda Hale : “You Can’t Sacrifice Partition Tolerance”
Coda Hale also argues that you can’t give away partition tolerance in his response, You Can’t Sacrifice Partition Tolerance. Mr. Hale suggest a more relaxed version of availability through graceful degradation that is based on the yield and harvest model from Eric A. Brewer’s ACM article, Lessons from Giant-Scale Services:
You cannot … choose both consistency and availability in a distributed system.
Of the CAP theorem’s Consistency, Availability, and Partition Tolerance, Partition Tolerance is mandatory in distributed systems. You cannot not choose it. Instead of CAP, you should think about your availability in terms of yield (percent of requests answered successfully) and harvest (percent of required data actually included in the responses) and which of these two your system will sacrifice when failures happen.
Dr. Stonebraker – Partition tolerance and Machine failure are two separate things
Both Henry Robinson and Coda Hale reffed to a Machine failure as part of their argument that you can’t avoid partition tolerance . Dr. Stonebraker provides clarification on that point on his response to Coda Hale:
.., the dead node is in one partition and the remaining N-1 nodes are in the other one. The guidance from the CAP theorem is that you must choose either A or C, when a network partition is present. As is obvious in the real world, it is possible to achieve both C and A in this failure mode. You simply failover to a replica in a transactionally consistent way. Notably, at least Tandem and Vertica have been doing exactly this for years. Therefore, considering a node failure as a partition results in an obviously inappropriate CAP theorem conclusion.
Daniel Abadi – Problems with CAP
Daniel Abadi Assistant Professor of Computer Science at Yale University outlines four issues in the current definition of CAP in Problems with CAP, and Yahoo’s little known NoSQL system:
- ..The definition of CP looks a little strange — “consistent and tolerant of network partitions, but not available” — the way that this is written makes it look like such as system is never available — a clearly useless system
- ..the roles of the A and C in CAP are asymmetric. Systems that sacrifice consistency (AP systems) tend to do so all the time, not just when there is a network partition
- ..What if there is a network partition? What does “not tolerant” mean? In practice, it means that they lose availability if there is a partition. Hence CP and CA are essentially identical
- Lack of latency considerations in CAP significantly reduces its utility.
Daniel suggest an alternative definition to CAP:
..CAP should really be PACELC — if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?
The main issue that I see with the current definition of CAP is that it describes each of the properties of CAP in absolute terms. In reality, however, each of the properties can be applied in various degrees. Eric Brewer’s definition of yield and harvest is one example of a more relaxed version of availability. Interestingly enough the last note in Gilbert and Lynch is also an interesting recognition that it may be possible to achieve different tradeoffs that provides both Availability and Consistency:
..in partially synchronous models it is possible to achieve a
practical compromise between consistency and availability. In particular,
most real-world systems today are forced to settle with returning “most of
the data, most of the time.” Formalizing this idea and studying algorithms
for achieving it is an interesting subject for future theoretical research.
The same applies to Partition Tolerance – there are various degrees of partition tolerance. Probably the most extreme one are the one suggested by some of the commenters which is to do with WAN. As Dr. Stonebraker argues, even that is not necessarily a high likelihood scenario for most applications:
There is enough redundancy engineered into today’s WANs that a partition is quite rare. My experience is that local failures and application errors are way more likely. Moreover, the most likely WAN failure is to separate a small portion of the network from the majority. In this case, the majority can continue with straightforward algorithms, and only the small portion must block. Hence, it seems unwise to give up consistency all the time in exchange for availability of a small subset of the nodes in a fairly rare scenario.
So the question whether or not you can address partition tolerance shouldn’t be measured in absolute terms but against the most likely scenario for your application. The answer as to what is a most likely partition tolerance scenario really comes down to common sense and may vary over a course of time and application business needs.
The introduction of the cloud may also change some of our core assumptions on how we deal with failure. With cloud, we can easily bring an alternate machine up to deal with a failure in matters of minutes, and we can also assume that we have a fairly robust underlying infrastructure with lots of built-in redundancy. That in itself changes the scenario where failures can happen as well as the way to deal with them, compared to the time when the original CAP Theorem was written.
Another source for confusion is the use of Amazon, Google and Facebook as a reference to justify the eventual consistency model. What we often tend to forget is that Amazon, Facebook and Google face fairly unique challenges that are not that common and even those three still rely on strongly consistent systems for the majority of their applications. As noted in Will Scalable Data Stores Make NoSQL a Non-Starter?:
Facebook famously invented the NoSQL Cassandra database but still relies on the venerable MySQL-plus-memcached combination for the brunt of its critical operations.
The bottom line of my argument is that giving up consistency should be our last resort. Rather then giving up Consistency for Partition Tolerance we could consider a different set of tradeoffs that deals with various degrees of Consistency, Availability, and Partition tolerance that fit our business needs and also take the performance and latency tradeoffs into account.
In the next post on that subject I’ll outline how that can be achieved using the reference architecture that I outlined in the previous post.
I would like to thank Ron Pressler for the constructive debate and for the useful references