The next big thing in big data: fast data

By |2020-10-26T08:51:11+00:00June 26, 2014|

The big data movement was pretty much driven by the demand for scale in velocity, volume, and variety of data. Those three vectors led to the emergence of a new generation of distributed data management platforms such as Hadoop for...


By |2020-10-04T09:47:39+00:00March 26, 2014|

NoSQL databases provide vast storage and high availability, but at the cost of losing transactions, relational integrity, consistency, and read performance. This post presents an architecture that combines and in-memory datagrid as a high performance transactional layer to a NoSQL database, providing a complete application platform with the high scalability [...]

Class-Based Eviction Strategy: Going Beyond LRU

By |2020-10-05T08:02:39+00:00August 30, 2012|

Consider you are building a financial back-office trade processing application. Trades are flowing in from the front-office system. To cope with the velocity and volume of the trades you want to use in-memory processing. The trades are usually settled no later than 24 hours from trade execution. These trades objects [...]

Making Hadoop Run Faster

By |2020-10-05T08:36:21+00:00August 21, 2012|

One of the challenges in processing data is that the speed at which we can input data is quite often much faster than the speed at which we can process it. This problem becomes even more pronounced in the context...

Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions

By |2020-10-06T07:16:14+00:00December 19, 2011|

Recently I held a webinar around architecting solutions for scalable and near-real-time risk analysis solutions based on the experience gathered with our Financial Services customers. In the webinar I also had the honor of hosting Mr. Larry Mitchel, a leading expert in the … Continue reading

Real Time Analytics for Big Data: An Alternative Approach

By |2020-10-06T08:22:54+00:00July 14, 2011|

Lately, we've been talking to various clients about realtime analytics, and with convenient timing Todd Hoff wrote up how Facebook's realtime analytics system was designed and implemented (See previous review on that regard here). They had some assumptions in design...

Real Time analytics for Big Data: Facebook’s New Realtime Analytics System

By |2020-10-06T08:25:55+00:00July 8, 2011|

Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system. As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel, Engineering Manager at Facebook. In this first post, I’d...

Read/write scale without complete re-write

By |2020-10-06T08:47:07+00:00June 3, 2011|

Last week I was attending one of our Partner events in Stockholm where I presented the convergence of trends in the data scalability world – specifically the transition from NoSQL to NewSQL and the convergence of trends that brings the...

Bridging the gap between the clouds

By |2020-10-07T12:14:09+00:00August 9, 2008|

Dekel Tankel of GigaSpaces spoke recently to a hip cloud crowd regarding the risks associated with moving an application to the cloud or grid environment. Without a GigaSpaces Space-based architecture (what I refer to as a TPC Architecture ) applica...

Deployment Predictability

By |2020-10-08T07:41:41+00:00May 23, 2008|

My colleague Uri raised an interesting point in his post. I completely agree with Uri, and would like to give an example.

I have been involved in a project for a mobile operator in the UK during the second half of last year. We built a scale-out SOA activation platform for a new mobile device launch using GigaSpaces. The GigaSpaces platform replaced an existing system.

The original system was built using JBoss as a backend server. Predictions were for a huge increase in activation-requests the system had to handle due to the new device launch. While the system worked fine using the JBoss as a platform, there was no way these guys could predict how many instances of JBoss they would need to run in order to cope with anticipated load. They started to do some benchmarking and performance testing to figure out where the system's limits were, but they soon found out, that the process was leading them nowhere. This is mainly because the JBoss's were inconsistent and they hit the scalability ceiling using a few (very few…) nodes. When adding more instances, the overhead of synchronising the JBoss cluster grow exponentially as suggested by
Ahmdal's Law, so the gain in TP that each instance added varied depending on the cluster size and other nodes' load, which kills predictability all together.

JBoss is just an example in this case. It's not a JBoss specific flaw, but rather a tier based approached which imposes a limited architecture.

They then came to us to resolves the predictability challenge.

We did an exercise to figure out how the deployment would look like using GigaSpaces, and came up with a linear formula of the HW and number of instances needed to support the given load. More than that, they knew that if the business predictions had been pessimistic, supporting extra load would mean simply deploying more spaces... On top of that, their back office systems did not support HA, and would explode if load increased suddenly, so GigaSpaces also provided HA and throttling for the backend servers. During one overnight test the database failed for about 4 hours, and the system was fully functional and completed users' requests, while completed requests waited for the database to be brought up again to complete the archiving process. The customer was truly impressed!

Needless to say that the launch went flawless, and there were no issues what so ever with the GigaSpaces based system.

So, yes – Uri makes a good point. GigaSpaces' customers can predict and properly plan ahead the deployment needed to support their business.


Let’s play…

By |2020-10-11T13:59:16+00:00May 19, 2008|

I have been involved in a nice gaming project in the last couple of weeks.In the first meeting, giving an overview, the prospect presented their challenge:"We store game data in memory as we need very fast response time, and we are reaching our limits ...