Implementing a real-life enterprise grade system can be tricky. Although over the years the microservices methodology has evolved to deliver agility, simplicity, and flexibility, it still lacks the ability to truly deliver low latency, massive throughput, and handle large amounts of volume.
Furthermore, real-time microservices requires the ability to structure services that include processing logic and its state without any dependency on external caching, database or messaging fabrics.
In this post, I will discuss the most common microservices challenges and how you can you overcome them. I’ll also expand on why you should consider a simple and elegant microservices implementation based on GigaSpaces. A solution which delivers the microservice incredibly fast access to its data, without relying on the enterprise backend (slow) database. It can also help you avoid expensive remote calls, serialization, and the garbage collection nightmare. A solution truly fit for low latency, event-driven systems.
Let’s begin by learning about the 3 main bottlenecks we face today with real-time microservices:
1. The interaction between different microservices generates plenty of network traffic, serialization overhead, and complex security configuration
The microservices concept suggests each business functionality include all of its associated components: Database, messaging, web tier and such.
In the diagram below you can see an example where the “Passenger”, “Driver” and “Trip” management systems, each expose a dedicated endpoint that uses an exclusively allocated database adapter with a specific database management system:
This structure forms an isolated environment for every microservice that is supposedly easy to maintain and enhance.
Let’s consider a more real-life, real-time example with complex, multi-users/entities with different event feeds. In this case, we’ll have to manage a separate database and messaging infrastructure, which should not be shared across different microservices. In cases where there is a need to share data between different microservices, a caching layer or an enterprise service bus (ESB) should be added.
The diagram below illustrates such a system with multiple microservices (“Passenger”, “Billing”, “Driver”, “Payments”, “Trip”, “Notifications”) which interact with each other.
On the upper left, we have an API gateway endpoint that frontends all mobile users access, where cascading calls executed to complete the entire flow. API gateway calls the “Passenger”, “Driver” and “Trip” management systems sequentially, where the Passenger invokes the payments, the Driver invokes the payments and the notifications and Trip invoke the billing microservices. The payments system must act fast and complete its transaction quickly, as both the Passenger and Driver are relying on it. Any delay with the payments system will result in a delay with the API gateway to respond back to the mobile app.
In reality, the system described above will be implemented using the architecture seen below. We have interdependency between Firewall, load balancer, app server, database, message bus and a cache situated between the app server and the database.
The cache layer is used for read-mostly scenarios. Write operations still relying on the slow enterprise database. Combining all these may cause inconsistency problems, as the cache need to constantly sync with the database.
When it comes to systems with millions of events being fed, it becomes exponentially more complicated and expensive to sync caching with the database layer. The cache should deliver atomic write-behind functionality, supporting conflation and aggregation of the distributed in-memory transaction, which is then processed by different cache nodes into a single database transaction. Unfortunately, you won’t be able to find such advanced functionality with most of the caching projects.
3. Event-driven scenarios are impossible to implement
Two main scenarios where the classic microservices won’t deliver on its premise are event stream processing (ESP) and complex event processing (CEP).
In both cases, you will have an event generator that is pushing events into a backend system. This could be a payment system, a fraud detection system, a market data system, a trading system, social media or some type of IoT system.
In the case of IoT, you could have tens of thousands of physical sensors generating millions of events every second, which need to be processed and analyzed in real-time. You may find these sensors in air crafts, trains, taxis, smartphones, cars, and more.
The processing flow may involve parsing, formatting, validating, ranking, aggregating, conflating, counting and persisting. All these should happen extremely fast. These systems cannot afford a major lag (few seconds may be considered a major lag) between the incoming events and the decision systems that should take actions in real-time.
A few milliseconds later, once the transaction has been completed, it should share the outcome transaction state with a remote data center for disaster recovery. All this should happen without losing a single bit, in totally durable consistent manner. Nowhere throughout this flow, there should be a data source that may end up inconsistent even for a short duration. Its state must be complete.