Data processing speeds have a huge business impact on enterprises that require time-sensitive processes and applications. Whether organizations need analytics to optimize business operations, track customer preferences and activities to provide timely, targeted, and personalized campaigns, or comply with regulations, performing at split-second speeds is an important competitive advantage.
This is where data infrastructure can directly impact success. Although key-value databases retrieve data quickly for millions of simultaneous users by utilizing distributed processing and storage, many applications that require complex queries over huge volumes of data can make key-value databases fall flat.
The Need for Online Speed and Scale
More and more transactions are moving online, and queries require faster response times to meet customer expectations and avoid abandoned sessions. In addition, regulations are constantly changing and adding new restrictions and guidelines that require compliance.
An excellent example involves a leading car manufacturer. They were required to calculate C02 emissions at millisecond performance every time a price quote was requested, whether it was by a customer online, in a dealership, or via a partner. Different features for new cars had a direct impact on projected C02 emissions, so calculations had to be performed in real-time while customers selected car options online. Millisecond performance and accurate results were required to avoid fines of €100s of millions annually. An estimated 3,000 requests per second needed to be processed to support the volume from buyers.
Up until the requirement to implement the C02 calculator, this automobile manufacturer leveraged a key-value database that performed adequately while executing data queries. However, based on early trials, computing the amount of C02 involved more complex queries, which resulted in inadequate performance.
A caveat in key-value database design is the core of this issue. Unlike a relational database that has predefined tables and relationships between the tables, a key-value database stores data without a structure or relations. For example, while a relational database could store all the options a user selected in a single table with the customer ID as a key, a key-value database had a separate record for each feature the customer selected, such as wheel rims, engine type, etc. A query to discover all the features a buyer selected would require duplication of the data and multiple data retrievals, which would increase the memory footprint by four to six times.
Because of all the available options for each car, there was an exponential number of combinations for this automobile manufacturer. Searching through all the records for each possible variation was too time consuming and not scalable. There was a lot of data duplication and fewer ways to make connections between records for fast retrievals. One option was to add mainframe capacity, but the related expenses of this option were very high and did not fit the customer’s plan for modernization and eventual migration to the cloud. The best solution was to search for a more efficient and modern Data Architecture.
Extreme Database Performance
The manufacturer decided to implement a data fabric that could accelerate performance. They selected an in-memory computing platform where the data structures supported fast complex queries and the ability to co-locate business logic with the data.
While the previous key-value database had only a primary index for simple key lookups, the replacement solution supported secondary indexes, including collections, textual, nested objects, and compound (multi-column) indexes. This enabled faster advanced queries across multiple dimensions, which was a requirement due to the dozens of parameters that influence CO2 emissions.
The in-memory computing platform also provided an advantage when performing calculations. The key-value database required the data to be modeled based on access patterns where each numeric operation, such as average/sum/min/max/group by/count required a key, while the in-memory computing platform ran these common and custom aggregations natively on the server-side in a distributed manner and in extreme performance.
There was also a problem with accuracy. Complex performance queries on key-value databases do not always provide accurate results. Since key-value stores are typically optimized for high throughput and low latency of single-record reads/writes, ACID (Atomicity, Consistency, Isolation, Durability) properties are guaranteed only at the single-record level. This means that queries that aggregate across two or more multi-record transactions can result in incorrect results.
The selection of a modern data platform solved the problem for the C02 calculator. The implemented solution delivered a 15-19 milliseconds query and analytics response time. The infrastructure footprint was reduced by a factor of 4-6 times, while scale was increased by 20 times. The new database structure was faster and more reliable.
Speed makes a difference, especially during the current pandemic, where more and more transactions are online as a result of working, learning, shopping, and banking remotely. Because most users will abandon a session if wait times are too long, making sure that database performance is up to par is an essential part of any digital solution. Having a modern Data Architecture built for extreme processing can provide the performance boost and scale that companies need. Read more about this case study here.
This article was originally published on Dataversity on August 4, 2020.