Scalability forces us to think differently. What worked on a small scale doesn’t always work on a large scale — and costs are no different.
To measure the cost impact of scaling, let’s look at the amount of resources required to scale to a given level. We’ll use Amdahl’s Law as a method to measure the amount of required CPUs. This will provide us a proxy for hardware and software costs. Later on we’ll also review other costs related to the process of scaling. What’s going to come up clearly in this analysis is that the cost of scalability grows exponentially in non-linearly scalable applications.
I’ll also argue that scaling is not just a technical issue: it has a direct impact on our business and its ability to effectively compete.
Measuring the Impact of Scalability on Hardware and Software Costs
The following analysis is based on Amdahl’s Law, which is described well in this Wikipedia entry. Put briefly, the
negative impact of the non-scalable portion of our application grows exponentially relative to the scaling requirement, up to the point in which adding resources will not improve performance/throughput, as seen in this diagram:
By non-scalable I mean the part that is serial (non-parallelizable) — or to use terminology that is more relevant: the level of contention in the application. Contention can be thought of as the percentage of time operations wait on things such as shared table-locks in the database, persistent queues or distributed transactions.
The diagram above shows that if 90% of our application is free of contention, and only 10% is spent on a shared resources, we will need to grow our compute resources by a factor of 100 to scale by a factor of 10! Another important thing to note is that 10x, in this case, is the limit of our ability to scale, even if more resources are added.
Now let’s see what it will cost us — in terms of CPUs and software licenses — to increase our scalability by a factor of 10, assuming that we have only 10% contention (and that’s a fairly optimistic scenario with prevalent tier-based architectures).
There are two key take-aways from this analysis:
- The cost of non-linearly scalable applications grows exponentially with the demand for more scale.
- Non-linearly scalable applications have an absolute limit of scalability. According to Amdhal’s Law, with 10% contention, the maximum scaling limit is 10. With 40% contention, our maximum scaling limit is 2.5 – no matter how many hardware resources we will throw at the problem
The following is inspired by numerous true stories I have seen at GigaSpaces customers.
A team is tasked with building an online order management application that needs to process 1,000 orders per second. They choose a typical n-tier architecture with a web server as the front-end and a database as the data-tier. Note: In this case I give the web-tier as the front-end. In some systems the front-end is actually a messaging system. While the implementation is rather different, the behavior for the purpose of this discussion is the same, as both represent different forms of feeds or service requests.
Let’s assume that the team designed the architecture by the book to ensure 100% reliability and consistency. This means that every critical transaction is stored in the database.
They now found that a single web server can handle only 200 transaction/sec, so they decide to put a load-balancer on the front-end, and deploy 5 web servers to meet the 1000 orders/sec requirement. At this point they realize that despite the fact that they increased the amount of servers by 5, the total number of orders/sec didn’t really increase by much — and only reached about 400 orders/second.
They start to monitor the system, and find that the application spent 40% of its time on shared locks in the database. As we already saw above, with 40% contention, we can only increase the throughput by a factor of 2.5 — or 500 orders/second, so no matter how many web servers are added, the application will never be able to meet its throughput goal.
So the team decides to reduce the contention by placing a distributed cache in front of the database, which will reduce the hits on the database. They are cost-conscious so they select a free product — memcached — as the caching solution. As memcached cannot serve as the system of record (it doesn’t support transactions, queries, is not highly-available, etc.), adding it reduces the contention, but does not eliminate it.
For this example, let’s take an extremely optimistic scenario and assume that memcached reduces the contention from 40% to 10%. According to Amdhal’s Law, increasing throughput by a factor of 5 with 10% contention, will require 10 servers. Now the team is happy!
Just as they get the system to work as expected, the boss knocks on the door and says: "Hey, we’re going to launch a major campaign for the holiday season, and marketing anticipates double the number of visitors on our site. If we’re very successful, we might even triple the normal traffic. Can we support this?".
The team is in initial shock. "Double? Triple? Are you kidding me!? We just worked our ass off to get this application to work for the existing load, and now he’s talking about doubling and tripling the load as if it were trivial?" The team now faces the prospect of explaining to management that they don’t know how to achieve this capacity, or even worse, that they are incapable of it (while their competitor has already achieved it last year). Let’s see: double throughput means 2,000 orders/sec. So they will need to increase single server throughput by a factor of 10. According to Amdhal’s Law they will need grow from 10 servers to 100 servers! Tripling the capacity is not even an option. The system already reached its maximum limit when it grew by a factor of 10.
And there’s one more thing the team needs to tell the boss…
Ahem, regarding the budget…
For the sake of discussion, let’s assume that the team was using free products to reduce software license costs. To meet their initial goal of 1,000 orders/second they had to use 10 machines for web servers and another one for the database. If each machine costs $10,000, total costs are $100k for the web-tier and another $10k for the data-tier, so we’re talking $110k total cost of hardware.
In addition, the team spent a substantial amount of time on iterations to analyze the bottleneck and find a solution, wire the pieces together, make sure everything works and modify code. Let’s say this development effort took 5 team members 3 months to complete, so it cost about $150k (again, I’m being optimistic). Total costs are at $260K.
As we’ve already seen, to double capacity, they will need 100 servers. Let’s also assume that it will require a similar effort to develop, tune and test it — so $900k in additional hardware and another $150K for development. Total costs are now at $1.31 million.
Now off to triple the load… wait, we can’t. The team now literally has to go back to the drawing board (a whiteboard in this case) and completely re-design and re-write the application — if they can even figure out how to do it. Can we even measure the cost of such a scenario? We can probably measure the development cost, but what is much harder to measure is the amount of money the company is going to lose because it can’t support the tripling of load.
It would be a false assumption that the company controls the situation, and has time to plan everything in advance. What if Mylie Cyrus shows up on the cover of Seventeen holding the super-cool gizmo the company sells and now all of a sudden there’s a mad rush to buy it on the site? We need to assume that we can’t predict the load. It doesn’t happen when we plan for it. It can happen when we least expect it.
With social networks or electronic trading and e-commerce, such events can quickly lead to a viral effect or what’s also known as an ‘event storm’. Such a chain of events can quickly lead to disaster, such as a site crash, and the company is now all over the news and the blogosphere (and not in a good way), customers are frustrated and many defect to the competition. The trouble is that they can’t fix it in a snap. Remember: they need a few months heads up to re-write the application.
There are a few things we can learn from this story:
- direct costs of scaling grow exponentially with the demand for scale;
- the cost of software licenses can be a very small factor in total cost;
- indirect costs, resulting from the unpredictability of our system and the inflexibility that it imposes on our business, can have huge implications to the business, well beyond any measurable criteria. Even if your company can measure the direct losses from downtime due to lost sales or trades, you’ll be hard-pressed to measure damaged reputation, loss of customer trust, and in some cases, the loss of your job!
A Better Approach: High-Throughput and Linear Scalability
Things don’t have to be the way I described above — and for a growing number of companies — they aren’t. In the blog post Twitter as a scalability case study, I detailed the principles that companies such as Google and Amazon follow, as well as those who use GigaSpaces, so I won’t repeat them here.
Following those principles, we use a memory-centric solution that addresses architecture holistically (end-to-end) and is linearly scalable. Because it is in memory, it performs much better. GigaSpaces customers have seen throughput improvements of between 10x and 100x, depending on the existing architecture and the scenario. Let’s say — again, being very conservative — that throughput with this approach is only 5x: 1000 orders/sec per server. This would mean that to meet the goal of 1,000 orders/sec we only need 1 machine (compared to 5). Even if the software cost of this solution is $20k/CPU, thanks to the reduction in hardware costs, we will end up with significant cost savings.
Perhaps more importantly, this approach follows a shared-nothing architecture (meaning each node is entirely self-sufficient) and is linearly scalable. As such, scaling simply requires adding more servers as needed, without code changes or complex provisioning. Moreover, if we know the throughput of a single server, we know exactly how many servers we will need to achieve future or unanticipated requirements. All we need to do is multiply the number of servers by the throughput per server. Remember, because it is linearly scalable, it does not suffer from diminishing returns, and there is no ‘scalability wall’. So if one machine can process 1k orders/sec, two machines can process 2k orders/sec and N machines can process Nk orders/sec.
In the case of our team above, if they had taken this approach, they would have only needed two servers to double the throughput (with a total solution cost of $220k – compared to $1.31 million! ) and three servers to triple it (with a total solution cost of $250K – compared to who knows what). And if they had faced any unanticipated peaks (thanks to Mylie), they could have quickly scaled for that as well.
What’s interesting about linear scalability, is that even if we assume no throughput increase at all PER SERVER, the cost savings are still remarkable. In our case, let’s assume that the linearly scalable solution still has a throughout of 200 orders/sec per single server. In order to achieve the 2,000 orders/sec throughput, we will just need 10 servers, compared to 100.
The example above illustrates that what initially appeared to be a low-cost solution (free middleware), became extremely expensive as scalability requirements grew.
Measuring the Cost of Scalability: A Cheat Sheet
Following is a short summary of what we should consider when measuring the cost of system scalability (or lack thereof):
- Cost of hardware and software, as a function of:
- How many CPUs (or machines) are required to achieve the desired throughput/concurrent users/latency, given the:
- Throughput per single server
- Level of contention, and therefore, the required number of servers needed to scale as prescribed by Amdahl’s Law. This calculation needs to be performed given different scale requirements (2x, 3x, 5x, 10x, etc.), as it will grow exponentially
- If we cannot achieve on-demand scalability — we also need to consider the cost of hardware and software required for over-provisioning to ensure we can handle peak loads
- Cost of development, QA and testing
- Initial design and development cost, including learning curve and integration
- Cost of re-designs and re-writes for when we need to scale our application
- Cost of failure – what is the cost of downtime due to under-provisioning or inability to scale on-demand. This should consider direct revenues loss, compensation payments, loss of productivity, damaged reputation and future income and so on.
- Provisioning, deployment and operations costs – including management and monitoring. The more complex the system is, the more difficult it is to identify bottlenecks and root causes.
A Final Note on Comparing Solutions:
In the context of scalability and ROI, when we evaluate competing solutions, we need to make sure that we are comparing apples to apples.
- Comparing Apples-to-Apples: It is not enough to measure the license cost. We need to normalize it with the performance and scale factors. For example, if two products cost the same in terms of cost/CPU, but one performs 5 times better, then the cost of that product is, in fact, one fifth of the other
- Total Cost of Ownership (TCO): We cannot look at the software license costs alone. We need to assess the overall cost of the system, including hardware (and other platform costs, such as OS), additional components required (such as load balancers and storage), cost of development and cost of failure. In the final analysis, free products that are not linearly scalable are going to cost much more than commercial products that are.
- End-to-End Measurement: When it comes to scalability, you are only as strong as your weakest link, so assessing the cost of scaling requires a holistic measurement. Before we compare two products we need to understand how they each play a role in achieving end-to-end scalability. Linear scalability requires an end-to-end solution. Solutions that are built from a bundle of tiers and products are likely to be non-linearly scalable, as contention is created by the integration of the tiers and products, the need to ensure consistency, different clustering models and other issues. This means that before we can even measure or compare cost, we first need to compare what it takes to reach linear scalability with each product. It might be that on a simple caching or messaging level, two products provide the same level of scalability. When, however, we need to integrate the messaging system, use a transaction manager, a database or filesystem to ensure reliability, our end-to-end scalability is going to be significantly limited.
This post was co-authored with Geva Perry.