Digital transformation initiatives driving data and application architecture modernization have resulted in several novel and creative approaches. In any given article, analyst report, or professional discussion, new architectural approaches such as ‘Data fabric architecture’, ‘digital integration hub’, ‘data integration hub’, ‘data mesh’, and ‘event-driven’ are being touted as the way forward for development teams looking to modernize their environments and implement new technologies.
The enthusiasm these new approaches are generating is exciting. At the same time, the number of new approaches is also creating confusion, since IT and data professionals, inundated with information, are challenged with trying to understand which framework would most benefit their organization.
This paper will focus on two architectural approaches – enterprise data fabric architecture and more broadly – data hub design – with the aim of providing data engineering and software application professionals a broader understanding of how they complement each other.
Data Fabric Architecture: What is it? And why do organizations need it?
A search on the Internet for the term ‘Data Fabric’ generates hundreds of results. Gartner, for example, defines a data fabric as “an emerging data management design for attaining flexible, reusable and augmented data management and integration” and explains that data fabric architecture “utilizes various data management technologies, such as data catalogs, data integration, data virtualization, orchestration and knowledge graph tools”.
IBM states that “Data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems”.
The need for Data Fabric concepts is a result of increasing complexity in data environments, specifically where data resides across multi-cloud, hybrid and on-premises environments.
The goal of data fabric implementation is to streamline and simplify the management of diverse data types and data sources – wherever it resides. The aim is to provide organizations with a cohesive data framework so that they can utilize their data across enterprise systems more efficiently and effectively. Another key goal of enterprise data fabric architecture is enhanced data governance. Data governance is achieved through a set of internal policies and standards which oversee and manage how data is discovered, accessed, used and secured across enterprise systems. Data governance is increasingly important since it is foundational to organizations’ ability to comply with ever changing national and international privacy and data regulations.
Ultimately, data fabric architecture is not rigidly defined and can be adapted to specific needs and use cases. Organizations should use it as a guiding blueprint to achieve better visibility, control, utility, and governance over data that resides in disparate systems.
Data Hub: What is it and why do organizations need it?
Data hubs are architectures aimed at efficiently connecting data producers and data consumers by sharing data through a ‘hub and spoke’ design. There are different types of data hubs, including data integration hubs, and digital integration hubs. While the goal of the Data Fabric is to streamline and simplify the management of diverse data types and data sources – wherever it resides, Data Hubs are architectures that contribute to the data fabric by enabling the efficient sharing of data among stakeholders, for both analytical and operational use cases. Indeed, according to Gartner, “By 2025, 80% of organizations will have deployed multiple data hubs as part of their data fabric to drive mission-critical data and analytics sharing and governance”.
Digital Integration Hub – DIH – -is a type of data hub specifically designed for delivering fresh data in real-time to enable transactional and operational workloads. DIH architecture is fast gaining traction with IT, data and software integration teams who want to accelerate the launch digital applications that rely on real-time data.
Data Integration Hub – a sub-set of data hub design – is an application architecture designed to rapidly deliver new digital services while ensuring high throughput, low latency, and always-on service availability. This is achieved by decoupling applications from diverse and disparate systems of record and replicating it into a low-latency high-performance data layer which delivers fresh data in real-time to digital applications.
Role of data fabric and digital integration hub concepts in modernization initiatives
Since both data fabric and data hub architecture make up the modern data technology stack, the question for enterprise and data architects is how to best leverage these design concepts to optimize their organizations’ modernization and digital transformation journeys?
As noted, the goal of a data fabric architecture is to consolidate data and utilize metadata to achieve better visibility, control, utility, and governance over disparate data through the integration of multiple and diverse data pipelines. In this regard, data hubs in general – including Digital Integration Hubs – can be considered part of a data fabric but focusing primarily on the supply side of fresh data in real-time to digital applications.
Digital Integration Hub architecture enables enterprises with multiple systems of record and large amounts of data in disparate systems – including legacy databases and cloud-based data stores – to optimally leverage their operational data for digital applications across any environment. It achieves this by decoupling digital applications from the systems of record, thereby decreasing the number of API requests against SORs. This allows IT teams to deliver fresh, accurate data to always-on digital services.
A key component of both architectures is the consolidation of multiple data sourced into a unified consolidated data layer. The focus of this consolidated data layer, however, differs according to the use cases that an organization needs to address.
The focus of digital integration hub design is to improve the performance of transactional and operational workloads and ensure digital applications can consume always fresh data in real-time. To this end, a key component of the Digital Integration Hub is a highly performant data layer that replicates operational data into a low-latency high-performance space and decouples the underlying systems of record from the digital applications. In this way, the digital integration hub continuously delivers fresh data in real-time to digital applications from the high-performance data layer.
Organizations have an ever-increasing urgency to deliver fresh data from disparate sources of record in real-time to business applications. While the world has been moving steadily toward a digital reality for many years, the COVID 19 pandemic rocketed the need for online services. This demand is being further accelerated by broad mobile adoption and organizations’ move to the cloud.
This explosion of digital services has severely stretched organizations’ ability to deliver the ‘always fresh – always on’ data that modern digital applications need. By implementing data hubs in general, and digital integration hubs in particular, organizations can achieve the broader goals of data fabric design while addressing the specific need of operational and transactional workloads for real-time, aways fresh data. To this end, the marriage of data fabric design and data hub design may ultimately lead to the concept of an operational data fabric architecture, with a focus on supporting transactional workloads and accelerating new digital service delivery.