Skip to content
GigaSpaces Logo GigaSpaces Logo
  • Products
    • InsightEdge Portfolio
      • Smart Cache
      • Smart ODS
      • Smart Augmented Transactions
    • GigaSpaces Cloud
  • Roles
    • Architects
    • CXOs
    • Product Teams
  • Solutions
    • Industry Solutions
      • Financial Services
      • Insurance
      • Retail and eCommerce
      • Telecommunications
      • Transportations
    • Technical Solutions
      • Operational BI
      • Mainframe & AS/400 Modernization
      • In Memory Data Grid
      • Transactional and Analytical Processing (HTAP)
      • Hybrid Cloud Data Fabric
      • Multi-Tiered Storage
      • Kubernetes Deployment
      • Streaming Analytics for Stateful Apps
  • Customers
  • Company
    • About GigaSpaces
    • Customers
    • Partners
    • Support & Services
      • University
      • Services
      • Support
    • News
    • Contact Us
    • Careers
  • Resources
    • Webinars
    • Blog
    • Demos
    • Solution Briefs & Whitepapers
    • Case Studies
    • Benchmarks
    • ROI Calculators
    • Analyst Reports
    • eBooks
    • Technical Documentation
  • Contact Us
  • Try Free

Writing your own real-time search engine and Adwords service

Subscribe to our blog!

Subscribe for Updates
Close
Back

Writing your own real-time search engine and Adwords service

Nati Shalom February 23, 2009
7 minutes read

There are various ways to implement a search engine. The typical way uses a crawler technique. The crawler continuously scans the information, collects changes and indexes them. The indexing service is a centerpiece in the architecture and enables quick matching of keywords into search results.

In some cases this technique is not applicable. For example, imagine the case of eCommerce applications. When you submit a bid on an auction, you expect to see it immediately popping up in the search results.This calls for a rather different architecture then the one typically used for Internet applications. The key is how fast we can put new data into the index server. Sounds simple, right? Well, most index servers are highly optimized for fast read, however they tend to be quite heavy on write operations.

This is where it makes sense to put the index server in-memory. Having an in-memory index enables both fast writes and fast searches. There is one small caveat, however ??? memory is limited in capacity and is not considered reliable, i.e. if the memory fails the data stored in that memory is gone. This is where In-Memory Data Grids come to the rescue. An In-Memory Data Grid (IMDG) addresses the capacity and reliability of memory. Capacity is addressed by breaking the data into multiple partitions; reliability is achieved by having at least one copy of each partition available in another memory instance. This is exactly what Shay Banon did in his Compass project, where goes into additional detail about how this model works.

clip_image001
Quoting Banon from his excellent post: Collocated Indexing and Distributed Search with GigaSpaces

“This type of integration takes collocation of indexing and searching to a new level. Indexing and Search operations are performed in a collocated manner in memory making them extremely fast. Scalability is easily handled by adding more partitions, and high availability is provided by adding backups to each partition”

Me and Shay had various discussions about that model. I personally see great potential in this type of offering. With the move of many of the batch analytics application into real-time-analytics, real-time search will become more common for many applications.

Using the search engine as an advertising channel – Adwords

Once you got your own search running, you???ve created a potential commerce area without even noticing. Different suppliers can compete on specific keywords and on the real-estate (location on the page) they’ll get on the search result (first, second, up, down..). Unlike TV commercial that sell ads on specific timeslot, Adwords provides an advertising channel per search click! One common way to take advantage of Adwords is to use Google Adsense which enables you to put Google ads in your site and get rewarded for the clicks that your site is generating. Having said that, the limitation of this model is that it relies on on the fact that you???re using Google as your search engine. If you are looking to use real-time search that is tailored to your site, this is not going to work. So if you???ve already gotten to the conclusion that you need your own custom search engine, you might as well implement your own Adwords to drive more money out of it.

Adwords is a very interesting model ??? it???s basically a sophisticated biding system. The algorithm involves matching between the supplier???s phrase and the phrase in the search query. The matching first looks at those bidders with sufficient budget on their account, it then looks at the track record of each supplier and rates the supplier based on the number of click-throughs that they got on their ads. For those who were selected, there is another algorithm the determines the order and specific location in which the ads appear on the screen. One of the most trivial one is exact match of keywords, i.e. ads with exact match appear higher up in the list.
The challenge with implementing your own Adwords is that this matching process needs to happen in real time during the search. So how can this be done?

If we consider Shay???s comment above, it should be fairly clear. If we already have the index in-memory, and search can be executed within the index server itself, why can’t we use the same idea for matching Adwords?
Well, the idea behind Space Based Architecture really came from these type of bidding scenarios. We realized that executing the matching algorithm collocated with the data is going to speed up the matching time significantly and improve response time. Below, I tried to sketch a simple diagram that illustrates how this model would work.
clip_image002
The indexed data will be stored in one set of partitioned cluster and Adwords data will be stored in another partition.
The search services will be running collocated with the index service as Banon suggested. In similar way, the Adwords matching services will be running collocated with the Adwords data.
A search request gets to the search portal. The search portal executes the search query as well as the Adwords matching in parallel.
It uses a map/reduce execution, meaning it is able to aggregate all matching results from each partition.

To enable this type of operation you can use either the Service Virtualization Framework in case you’d like a strongly typed query service, or the executors framework in case you???re looking for ad-hoc tasks execution.
You can use Futures as the return value in order to simplify the process of executing the search and Adwords matching in parallel. Future will enable you to fork those two tasks and then collect the results using that future handle.
If you would like to process all the results as they they arrive, you can use AsyncResultFilter for that purpose (a typical use case will be to print the result on your page while still waiting for the rest of the result to arrive). For more information and options on using parallel execution to speed up the search and matching process, refer to the executors framework.

A real-life scenario

One of the recent real-life scenarios that I came across recently is Rednano. Rednano is a local portal in Singapore that provides localized search capabilities. Interestingly enough, they built their portal on Spring and Hibernate. They also looked for a way to build their new add-on services such as Adwords in a scalable manner, and GigaSpacs was pretty much the closest solution they could get to meet their needs.
According to Patrick Ng, CTO from Rednano, they where able to integrate GigaSpaces in their existing Spring architecture in matter of two days! Patrick provides an interesting insight on their selection process in an online presentation during a Cloud event in Singapore.

Rednano – The Decision For Selecting Partner – Technology

During the writing of these lines, I came across some interesting news: Rednano wins award for breaking new ground. Quoting from the news item:

Local search and directory engine rednano.sg has won a prestigious international award given to companies with effective online search technologies.
The search engine, a collaboration between Singapore Press Holdings and Schibsted ASA, won the Digital Market Award at the Fastforward’09 business and technology conference in Las Vegas earlier this week.
The annual award is given to companies which constantly ‘create new and value-added services for its users’ in the fast-moving technology world.

Congratulation to Rednano on their success!

CATEGORIES

  • Data Grid
  • GigaSpaces
  • syndicated
Nati Shalom

All Posts (167)

YOU MAY ALSO LIKE

January 22, 2009

Saving cost using Application/Middleware virtualization
5 minutes read

June 21, 2007

GigaSpaces 6.0 – Download Now!
5 minutes read

June 12, 2012

Join us for a stellar…
3 minutes read
  • Copied to clipboard

PRODUCTS, SOLUTIONS & ROLES

  • Products
  • InsightEdge Portfolio
    • Smart Cache
    • Smart ODS
    • Smart Augmented Transactions
  • GigaSpaces Cloud
  • Roles
  • Architects
  • CXOs
  • Product Teams
  • Solutions
  • Industry
    • Financial Services
    • Insurance
    • Retail and eCommerce
    • Telecommunications
    • Transportation
  • Technical
    • Operational BI
    • Mainframe & AS/400 Modernization
    • In Memory Data Grid
    • HTAP
    • Hybrid Cloud Data Fabric
    • Multi-Tiered Storage
    • Kubernetes Deployment
    • Streaming Analytics for Stateful Apps

RESOURCES

  • Resource Hub
  • Webinars
  • Blogs
  • Demos
  • Solution Briefs & Whitepapers
  • Case Studies
  • Benchmarks
  • ROI Calculators
  • Analyst Reports
  • eBooks
  • Technical Documentation
  • Featured Case Studies
  • Mainframe Offload with Groupe PSA
  • Digital Transformation with Avanza Bank
  • High Peak Handling with PriceRunner
  • Optimizing Business Communications with Avaya

COMPANY

  • About
  • Customers
  • Management
  • Board Members
  • Investors
  • News
  • Events
  • Careers
  • Contact Us
  • Book A Demo
  • Try GigaSpaces For Free
  • Partners
  • OEM Partners
  • System Integrators
  • Value Added Resellers
  • Technology Partners
  • Support & Services
  • University
  • Services
  • Support
Copyright © GigaSpaces 2021 All rights reserved | Privacy Policy
LinkedInTwitterFacebookYouTube

Contact Us