have been many separate discussions about SOA and virtualization. Only a few addressed
how they relate to each other.
Interestingly, while I was working on this post, Geva Perry brought to my attention Judith Hurwitz’s blog – Is
Virtualization the foundation of SOA? Where she made the following
points:
I have been doing a lot of thinking lately about virtualization and cloud
computing. The more I look at the foundational requirements for virtualization
the more I am convinced that there is a close relationship.
…you have physical resources, data, images, application components in
containers, etc. How do you make sure that they maintain the state that you
desire? How do you ensure that this version of your resources acts like a well
oiled machine?
The second comment by Judith brings up another issue that is often ignored, which I refer to as Intra-Application SOA. There is a huge difference from an architecture standpoint between inter-application SOA, meaning among multiple systems, such as SAP and Siebel (for example, using Web Services) to intra-application SOA, which applies to a specific application, such as an Order
Management System or an eCommerce application. The latter are composed out of multiple service
components that need to behave as a single application. The consistency, latency, and performance requirements are fundamentally different among “intra” and “inter”, and so is the level of granularity in which we break our application services.
This post focuses on Intra-Application SOA and how we can apply virtualization patterns to simplify the way we turn simple objects (e.g., POJOs) into distributed services. First, I’ll start with a passage from the Wikipedia definition of SOA:
SOA separates functions into distinct units (services),
which can be distributed over a network and can be combined and reused to
create business applications.[2] These services communicate with each other by passing
data from one service to another, or by coordinating an activity between two or
more services. (Source)
Taking this definition, one way to look at it is that SOA is an evolution of RPC. The main difference between SOA and
RPC is that with the latter we maintained direct relationship between
the client that invoked a business function and the service that delivered that
function, whereas In SOA we break this direct relationship and therefore can map
a certain business function to different services that together deliver that specific business function. This seemingly minor difference opens up an entire
world of interesting opportunities related to how we map a business request to
the underlying service(s) that implement that request. Below is a summary of some of the possible relationships:
- Synchronous – In
this case we would like our system to wait until a certain business function is executed, and only then proceed.- Asynchronous – In
this case we send a request for executing a certain business function but we
don’t wait for its completion. In order to know the status of this
operation we normally get a logical
handle (a.k.a Future), which enables us to inquire on the status of the request at
a later stage and get the result later. Quite often the request
itself triggers a chain of events, which means that the actual service that receives
the request may not be the service that will deliver the result of that
request.- Parallel – In this scenario, the application is served by multiple instances of the same service (a common
scenario in the case of partitioning). We would like to
execute the same operation on all service instances at the same time, and
aggregate the results. This pattern is also referred to as Map/Reduce.- Content-Based – The
mapping of a certain business request to the service that implements it is based on the content of the request. A common scenario would be to
use this method for routing requests to the service instance that contains the
relevant data required to perform the request. This is also referred to as data
affinity.
From
a reliability perspective, a request can be made transactional, which means that
even if it was invoked asynchronously we are guarantied that it will not be
lost if the service failed during the execution of the request. In the event of such failure, the request will be rolled back and another instance of the service will
pick it up.
A common problem today is that each of the approaches listed above is believed to require a different transport. Quite often it also requires
an explicit change in our service implementation. For example, to achieve
synchronous invocation we use something such as RMI. For asynchronous invocation we
use a messaging system, such as JMS. For parallel execution we
have things like Map/Reduce or Master/Worker.
Things become more complex if we’re
dealing with stateful services, for which we need to add data affinity and
transactional consistency. In such cases we need to be able to
route requests to where the data is, and make sure that the service invocation uses
the same transactional context as the data service that maintains state to make sure that in case of failure both the state and the invocation
will be rolled back to a consistent state.
The
reason for these limitations is that existing
solutions evolved kind of backwards, meaning we designed our systems from the
technical requirement perspective. In other words, we would ask ourselves: “how do we perform
parallel execution?”, “how do we execute asynchronous operations?”
and designed our services differently based on the way they had been invoked. We did not ask the question: “what does it mean to run
multiple services that serve the same business function?”, or “how can
we abstract the fact that we’re interacting with more than one service instance
from the client that is using it?”, or “how can we enable clients to
choose the method of invocation and the degree of reliability without changing
our service implementation?”
This
is where virtualization comes to the rescue.
“[Virtualization is] a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. This includes making a single physical resource (such as a server, an operating system, an application, or storage device) appear to function as multiple logical resources; or it can include making multiple physical resources (such as storage devices or servers) appear as a single logical resource.” [1] (source)
Now looking at SOA from a virtualization perspective: We should think of the case in which we have 1, 10 or 100
service instances serving a certain business function, as if they were a single
service. The way we choose to interact with these services should be abstracted
from the client that is using them. In the most simple scenario a client
should be able to invoke a method on a service as if it was a local instance or
a single remote server. The client should be abstracted from the way we route
the request, as well as from the physical location of the service instance over
the network. In the same way our system should be smart enough to benefit from
the fact that the service is collocated. In such a scenario, the client request
wouldn’t need to go through the network at all. Having said that, there are
cases where we would need to enhance some of the service semantics in cases
where we want to map a single request to multiple services in parallel. In such
cases we will need to introduce a reducer handler that will be responsible for
aggregating the results from all the services.
The
most interesting thing about this approach is the simplicity in
which we can now turn a simple POJO into a distributed service based on SOA ,
without writing code and with an advanced level of reliability and flexibility. We don’t have to think in advance about how our service is going to be invoked. We don’t even need to change our configuration if the service is collocated or remote. Just think about what that means from a testability prepective. We can write our service in our own IDE, run all our functional tests locally and then, using the same exact code, run it in a full-fledged distributed deployment.
Uri Cohen wrote in his blog a description of the OpenSpaces Service Virtaulization Framework that illustrates how this pattern can be implemented.