Skip to content
GigaSpaces Logo GigaSpaces Logo
  • Products
    • InsightEdge Portfolio
      • Smart Cache
      • Smart ODS
      • Smart Augmented Transactions
    • GigaSpaces Cloud
  • Roles
    • Architects
    • CXOs
    • Product Teams
  • Solutions
    • Industry Solutions
      • Financial Services
      • Insurance
      • Retail and eCommerce
      • Telecommunications
      • Transportations
    • Technical Solutions
      • Operational BI
      • Mainframe & AS/400 Modernization
      • In Memory Data Grid
      • Transactional and Analytical Processing (HTAP)
      • Hybrid Cloud Data Fabric
      • Multi-Tiered Storage
      • Kubernetes Deployment
      • Streaming Analytics for Stateful Apps
  • Customers
  • Company
    • About GigaSpaces
    • Customers
    • Partners
    • Support & Services
      • University
      • Services
      • Support
    • News
    • Contact Us
    • Careers
  • Resources
    • Webinars
    • Blog
    • Demos
    • Solution Briefs & Whitepapers
    • Case Studies
    • Benchmarks
    • ROI Calculators
    • Analyst Reports
    • eBooks
    • Technical Documentation
  • Contact Us
  • Try Free

Real-time Spatial Analytics with InsightEdge Spark: Taxi Price Surge Use Case

Subscribe to our blog!

Subscribe for Updates
Close
Back

Real-time Spatial Analytics with InsightEdge Spark: Taxi Price Surge Use Case

Oleksiy Dyagilev September 19, 2016
5 minutes read

A couple of weeks ago we launched InsightEdge, introducing you to our high performance Spark distribution enterprise-grade OLTP capabilities. In this blog post, we will create a demo application for a taxi price surge use case that runs real-time analytics on a streaming geospatial data.

We take a fundamental supply and demand economic model of price determination in a market. We will then compute price in real-time based on the current supply and demand.To make our demo even more fun, we will create a taxi price surge use case. We will consider the transportation business domain, and taxi companies like Uber or Lyft in particular.In taxi services, the order requests and available drivers represent the supply and demand data correspondingly. It is interesting that this data is bound to geographical location, which introduces additional complexity.

Comparing to business areas like retail, where the product demand is linked to either offline store or a well known list of warehouses, the order requests are geographically distributed.
With services like Uber, the fare rates automatically increase when the taxi demand is higher than drivers around you. The Uber prices are surging to ensure reliability and availability for those who agree to pay a bit more.

Taxi Price Surge Use Case – 3 Key Architectural Questions:

  • How do we handle the events like an ‘Order Request’ event or a ‘Pickup’ event?

  • How do we compute the price accounting the nearby requests? We will need to find an efficient way to execute geospatial queries.

  • How can we scale technology to run business in many cities, states or countries?

How Does this Architecture Addresses Our 3 Key Questions?

With InsightEdge Geospatial API we are able to efficiently find nearby orders and, therefore, minimize the time required to compute the price. The efficiency comes from the ability to index order request location in the datagrid.

Kafka allows to handle a high throughput of incoming raw events. Even if the computation layer starts processing slower(say during the peak hour), all the events will be reliably buffered in Kafka. The seamless and proven integration with Spark makes it a good choice for streaming applications.

InsightEdge Data Grid also plays a role of a serving layer handling any operational/transactional queries from web/mobile apps.

All the components(Kafka and InsightEdge) can scale out almost linearly;
To scale to many cities, we can leverage data locality principle through a full pipeline (Kafka, Spark, Data Grid) partitioning by the city or even with a more granular geographical units of scale. In this case the geospatial search query will be limited to a single Data Grid partition. We leave this enhancement out of the scope of the


Building a Demo Application for a Taxi Price Surge Use Case

To simulate the taxi orders we took a csv dataset with Uber pickups in New York City. The demo application consists of following components:

  • Feeder application, reads the csv file and sends order and pickup events to Kafka
  • InsightEdge processing, a Spark Streaming application that reads from Kafka, computes price and saves to datagrid
  • Web app, reads orders from datagrid and visualizes them on a map

Coding Processing Logic with InsightEdge API

Let’s see how InsightEdge API can be used to calculate the price:

val ordersStream = initKafkaStream(ssc, "orders") // step 1
ordersStream
  .map(message => Json.parse(message).as[OrderEvent]) // step 2
  .transform { rdd =>  // step 3
    val query = "location spatial:within ? AND status = ?"
    val radius = 0.5 * DistanceUtils.KM_TO_DEG
    val queryParamsConstructor = (e: OrderEvent) => Seq(circle(point(e.longitude, e.latitude), radius), NewOrder)
    val projections = Some(Seq("id"))
    rdd.zipWithGridSql[OrderRequest](query, queryParamsConstructor, projections)
  }
  .map { case (e: OrderEvent, nearOrders: Seq[OrderRequest]) => // step 4
    val location = point(e.longitude, e.latitude)
    val nearOrderIds = nearOrders.map(_.id)
    val priceFactor = if (nearOrderIds.length > 3) {
      1.0 + (nearOrderIds.length - 3) * 0.1
    } else {
      1.0
    }
    OrderRequest(e.id, e.time, location, priceFactor, nearOrderIds, NewOrder)
  }
  .saveToGrid() // step 5
  • Step 1: Initialize a stream of Kafka orders topic
  • Step 2: Parse Kafka message that is in Json format (in real app you may want to use formats like Avro)
  • Step 3: For every order we find other non-processed orders within 0.3 km using InsightEdge’s zipWithGridSql() function
  • Step 4: Given near orders, we calculate the price with a simple linear function
  • Step 5: Finally we save the order details including price and near order ids into the data grid with saveToGrid() function

>> The full source of the application is available on github

Taxi Price Surge Use Case Summary

In this blog post we created a demo application that processes the data stream using InsightEdge geospatial features.

An alternative approach for implementing dynamic price surging can use machine learning clustering algorithms to split order requests into clusters and calculate if the demand within a cluster is higher than the supply. This streaming application saves the cluster details in the datagrid. Then, to determine the price we execute a geospatial datagrid query to find which cluster the given location belongs to.

CATEGORIES

  • Fast Data
  • GigaSpaces
  • InsightEdge
  • Machine Learning
  • Spark
Oleksiy Dyagilev

All Posts (2)

YOU MAY ALSO LIKE

May 20, 2012

Bare-Metal PaaS
1 minutes read

March 17, 2008

Excel That Scales at Microsoft…
2 minutes read

December 27, 2017

Insights for a New Year…
4 minutes read
  • Copied to clipboard

PRODUCTS, SOLUTIONS & ROLES

  • Products
  • InsightEdge Portfolio
    • Smart Cache
    • Smart ODS
    • Smart Augmented Transactions
    • Compare InsightEdge Products
  • GigaSpaces Cloud
  • Roles
  • Architects
  • CXOs
  • Product Teams
  • Solutions
  • Industry
    • Financial Services
    • Insurance
    • Retail and eCommerce
    • Telecommunications
    • Transportation
  • Technical
    • Operational BI
    • Mainframe & AS/400 Modernization
    • In Memory Data Grid
    • HTAP
    • Hybrid Cloud Data Fabric
    • Multi-Tiered Storage
    • Kubernetes Deployment
    • Streaming Analytics for Stateful Apps

RESOURCES

  • Resource Hub
  • Webinars
  • Blogs
  • Demos
  • Solution Briefs & Whitepapers
  • Case Studies
  • Benchmarks
  • ROI Calculators
  • Analyst Reports
  • eBooks
  • Technical Documentation
  • Featured Case Studies
  • Mainframe Offload with Groupe PSA
  • Digital Transformation with Avanza Bank
  • High Peak Handling with PriceRunner
  • Optimizing Business Communications with Avaya

COMPANY

  • About
  • Customers
  • Management
  • Board Members
  • Investors
  • News
  • Events
  • Careers
  • Contact Us
  • Book A Demo
  • Try GigaSpaces For Free
  • Partners
  • OEM Partners
  • System Integrators
  • Value Added Resellers
  • Technology Partners
  • Support & Services
  • University
  • Services
  • Support
Copyright © GigaSpaces 2021 All rights reserved | Privacy Policy
LinkedInTwitterFacebookYouTube

Contact Us