Data is the backbone of many industries – supply chain functions, AI development and operation, smart manufacturing, and customer engagement in the retail industry, just to name a few – and the recent uptick in data usage means companies are finding ways to capitalize on actionable insights. When organizations learn how to properly move and manage data, business innovation thrives and the bar for competition goes up.
Innovation starts with building a cohesive Data Strategy. With the global datasphere expected to triple in size by 2025, data usage is exploding across all industries. As organizations generate copious amounts of data, they need ways to compile, process, and analyze their data.
Developed in the mid-2000s, data fabric was created as a tool for organizations as the move to the cloud became popular. In parallel, the turn of the century has also seen globalization move from just outsourcing of consumer goods manufacturing to entire industries acquiring companies in other regions and countries. Often, this has meant IT departments must absorb the applications, data, and underlying platforms into the overall set of enterprise assets. Ongoing digital transformation has only heightened the importance of doing this well and quickly.
Data fabrics move and manage large amounts of data, helping to reduce complexities, streamline data functions, and move data from one location or platform to another. The data fabric allows for the movement and access of data across platforms, data processes, geographical locations, corporate boundaries, and structural approaches. A data fabric permits the easy access of data, under the same management, in real time.
Managing Complexities with Data Fabric
Data fabrics are already widely used among organizations, and as they embrace the edge to manage data, they need a solution that can continue to move data from the edge to the cloud, and back again.
Devices that operate at the edge are varied and complex, controlling important information for various industry functions. Smart devices that operate at the edge can determine the location of a cargo ship, control process flows in a chemical plant, and work with pressure sensors that determine the weight and medical diagnostic imagery that pinpoints potential cancer cell clusters. The data from these devices used to be processed at data centers, but now much of the data processing happens in the cloud.
The amount of data collected from the devices operating at the edge takes many forms and requires power and a way to manage complexities. Due to the intricacies at the edge, organizations need to determine which pieces of the processing are done at which level. There’s an application for each, and for each application, there’s a manipulation. For each manipulation, there’s processing of data and memory management.
A data fabric is key to managing these complexities. To create intelligent applications, organizations look to incorporate data from other devices, or the grid, the gateway, and the cloud. Typically, there are multiple stages of data processing, with pre-processing on-device, further processing at the gateway for governance of downstream devices and to optimize what data is sent back to the cloud. As more data gets created at the edge, data fabrics will evolve into a specialized edge data fabric.
Common Elements of Edge Data Fabric
Edge data fabrics must perform several important functions to handle the increasing data requirements of edge devices. The edge is rapidly becoming the new cloud, which means these requirements are all necessary. With that in mind, organizations must solidify their edge data fabric in the core cloud to ensure that intelligent applications running on each device can run smoothly and their complexities are managed.
To do this, all data fabrics must do the following:
- Access many different interfaces – including HTTP, MTTP, radio networks, and specialized industrial networks, such as in manufacturing and agriculture. There are sets of interfaces that are predominate in different areas of the edge (for example, with HTTP being the standard for Enterprise IT, MTTP quickly becoming a de facto standard for IoT).
- Run on multiple operating environments – as well as be POSIX-compliant. Virtually all embedded operating environments are now Linux-based – even those from Microsoft. However, even with this underlying common OS, there are still differences in file formats, ports used, APIs, and more. Data Management and processing must create an OS abstraction later for data processing and analytics.
- Work with key protocols and APIs – including more recent ones with REST API and JSON data payloads. There are several programming and interpretive languages used in the embedded space that often maps to generations of developers and designers. For example, developers working with direct hardware interfacing tend to work with C/C++/C# yet shift to Python when developing the code to process the data collected and tools associated with Python for visualization. The ability to use common APIs across programming and integration environments creates a developer centric abstraction layer for data processing and analytics.
- Provide JDBC/ODBC database connectivity – for legacy applications and a quick, seamless connection between databases. This requires not just adoption of new protocols and APIs but steadfast adherence to commonly used standards for connectivity to legacy applications.
- Handle streaming data – through standards such as Spark and Kafka because in many cases, the data processing and analytics will need to be performed in real time and ongoing for a period of time.
The Case for Edge Data Fabric
To truly harness all the intelligence and processing being done at the edge, there can no longer be single-location data centralization. The majority of data is going to stay at the edge, and as edge intelligence increases, automated routines will also increase. Machine learning (ML) will step in for manual processes, which will run in an unsupervised fashion at the edge. While organizations may think the perfect solution is to move all the data to the cloud, there are three reasons why this can’t be done:
- Bandwidth: It takes a lot of bandwidth to move data to the cloud. Every time there’s a jump from 2G to 3G to 4G to LTE to 5G, there is an accompanying bandwidth surge. Each time there is new bandwidth, less can happen at the cloud, and you’ll always have more data outpacing the new bandwidth.
- Latency: To execute the automated process of moving all the data to the cloud, real-time decisions need to be made at the point of action. Making that decision and sending that decision back to the point of action would create too much latency – even with the speed of 5G.
- Privacy and security: With organizational risks, the smartest thing to do is gather and keep the data you need and throw the rest away. Building a historical baseline with only the data you need allows you to do this locally vs. putting everything in the cloud.
To move from the cloud to the edge and back, historical data from the edge will need to flow to the ML algorithm developers for design, tuning, and drift adjustment. Also, key pieces of data from the edge will flow to the core cloud, whereas small data sets will flow from core systems to the edge. This speaks to the fluidity of data and the need to seamlessly connect the edge data fabric to the core cloud data fabric.
An edge data fabric that is simply an extension of cloud data fabric will accelerate its adoption and bring the entire IT community from the cloud to the edge. An edge data fabric is still not synonymous with a data fabric, because of the general requirements and the need to run across so many different platforms. Just as cloud data fabric isn’t based on a single technology, neither is edge data fabric. It’s a matter of leveraging the framework and philosophy of data fabric and making sure there is a handoff between edge and cloud, for both the data sources and consumers, across the edge to cloud and back.