The previous blog in our ‘Open Platform’ series focused on the Smart DIH Service digitization layer and its utilization of Open API to enable low code delivery and consumption of microservices. As a reminder, Smart DIH is an operational data hub designed to deliver the ‘always fresh – always on’ data that modern applications rely on. It aggregates multiple back-end systems into a low-latency, scalable, high performance data layer exposing APIs and events. By decoupling systems of record from digital applications, Smart DIH enables enterprises to drastically shorten the development and deployment cycle for new digital services, and rapidly scale to serve millions of concurrent users – no matter which IT infrastructure or cloud topologies they rely on – cloud, on-prem or hybrid.
This blog will zoom into how Smart DIH utilizes SQL extensibility to provide composability, flexibility and ease of use to data professionals, empowering them to build on their extensive expertise to develop data-driven services.
Data: The Strategic Asset Driving Digital Transformation
The changing perception of the role data delivery plays in modern enterprises is reshaping expectations about the role and skill set of data professionals. Traditionally, skill sets have been quite specialized: Database Administrators (DBA) own the database environment and lead all activities to ensure it is fully functional from security, through to performance and operations. Data integration efforts are led by data engineers who specialize in ETL tools. Then there are the data programmers who develop new data-driven features and applications. SQL is the lingua franca of this cohort.
Organizations increasingly perceive data as a strategic asset that fuels the development of customer facing and internal applications. Data professionals, however, are not necessarily skilled to develop the services that make up these applications. Therefore, organizations cannot fully benefit from their extensive data knowledge and experience to optimize data utilization, and frequently need to rely on specialized programming teams, leading to unnecessary complexity and delays.
By providing data professionals with sophisticated SQL-based tools, Smart DIH empowers them to be hands-on and manage the entire data food chain, including data integration and application integration, which are instrumental in the development of sophisticated data driven services.
How then does Smart DIH utilize SQL extensibility as part of its Open Platform Architecture design, expanding the skill set of data professionals and increasing their contribution to business value creation?
Smart DIH consists of three functional layers: Digitization layer, hosting layer and a data
integration layer, all of which implement open platform design principles. The Smart DIH integration layer connects to various data sources and creates data pipelines which deliver data into the Smart DIH highly performant hosting layer.
Many of the underlying data sources that feed into Smart DIH’s integration layer support SQL. As such, connecting Smart DIH to these data sources is seamless, and can be achieved by using native SQL capabilities or by connecting very simply to the ETL tools already deployed in the organization. Smart DIH supports batch ETL pipelines for capturing full snapshots as well as continuous incremental updates to keep the data in sync and fresh. It offers two methods to accomplish this:
1.The Smart DIH pluggable connector framework provides an incremental loader option which connects to any JDBC enabled data source – for example, an RDBMS such as MS-SQL Server. By using a simple SQL statement, the incremental loader pulls data from the source in batches and loads it into the Smart DIH. This option provides singular benefits derived from the inherent value of implementing Smart DIH as a fully pre-integrated solution. These include:
- Standardization of the data journey, regardless of source and choice of connector.
- Real-time performance for event-driven updates of stream-based data pipelines, using frameworks such as Kafka, Flink and, of course, GigaSpaces’ own Space.
- Rapid data load, required in initialization and recovery scenarios. Using both Kafka and Space partitions.
- Built-in continuous update mechanisms (CDC, incremental batch), enabling the addition of new tables and sources while keeping the data up to date for operational services.
- Built-in data cleansing capabilities including validation and registration for the purpose of observability and governance.
- Built-in reconciliation mechanisms, to support various recovery and schema change scenarios.
2. Recognizing that there may be cases where third party ETL tools could be used for data ingestion, Smart DIH, through its support of SQL extensibility, enables data professionals to use popular ETL tools such as Talend. These ETL tools support PostgreSQL as a target for the ETL “load” target. The Smart DIH data gateway exposes a “pgwire” enabled endpoint, which is the PostgreSQL wire protocol. As a result, ETL tools can treat the Smart DIH as a PostgreSQL target for any ETL process they orchestrate.
The above examples show how SQL extensibility in Smart DIH supports both native data ingestion capabilities via the Smart DIH incremental loader, while also facilitating a ‘use your own SQL tools’ approach through its support of PostgreSQL targets.
Within its data hosting layer, Smart DIH supports SQL to expose its hosted data as if it was a PostgreSQL RDBMS through its data gateway service. This makes data accessible to any tool or application that can connect to RDBMS – including BI tools such as Tableau or PowerBI. For example, to build Tableau dashboards utilizing Smart DIH, simply choose PostgreSQL JDBC data source, provide the Smart DIH’s data gateway endpoint details and you are ready to go. All the Smart DIH types will appear as tables and from there, users can rely on the Tableau drag & drop interface to build their dashboards. In this way, Smart DIH offers data professionals optimal flexibility to use SQL to interface with existing off the shelf apps which cannot be easily customized to consume APIs.
Yet another way in which Smart DIH supports SQL extensibility is by interfacing with SQL development tools such as DBeaver. Taking into consideration that Smart DIH will be ingesting data from numerous databases, Smart DIH makes it easy for data professionals to continue to use their SQL-based data admin and editing tools to interface with their data sources and maintain full visibility of these sources.
As noted in this blog, SQL extensibility is a core element of Smart DIH open platform design and plays a key role in empowering data professionals to broaden the scope of their roles:
- Data engineers and citizen developers can develop new services using SQL statements without having to rely on specialized development teams
- Organizations can leverage the SQL tools they have already deployed in order to seamlessly create data pipelines into Smart DIH from diverse data sources without having to install any software or configure special access permissions
- Data professionals can easily use the SQL editing and management tools they are already familiar with to maintain a centralized view of multiple data sources
Ultimately to win in today’s digital reality, companies need to be able to launch online applications quickly. Data professionals play a key role in making data accessible in real-time to make this happen and utilizing solutions that optimize their skillset is crucial to staying ahead. SQL is a natural fit for developers whose expertise is in data rather than advanced programing. To this end, with its open platform design and innovative use of SQL extensibility, Smart DIH empowers data professionals to lead strategic data-driven modernization initiatives in their organizations.
Learn more about Smart DIH: Watch a demo now!