Applications provide a way to capture raw data in forms and store it in databases, and automated processes make it possible to extract meaning from that data using application programming interfaces (APIs). The current process-centric mindset assumes that the value resides in the automated processing, yet the limitations and costs inherent in its reliance on APIs have led to a re-examination of priorities and new thinking about where data value resides.
“The data platform market is transforming into a new world where most of the value coming from what we’re building is actually the data itself,” said Brian Platz, co-CEO and Founder of Fluree in a recent DATAVERSITY® interview. Twenty years ago, the need to share data externally wasn’t a requirement, Platz said, so APIs were developed as a work-around. Over time, as the apps and their data structures changed and evolved, companies have discovered the true cost of building and maintaining APIs.
An enterprise app costs on average $175,000 to build. Building a single API costs about $20,000 to build, he said, and about 50 percent more per year to maintain. “So just building ten APIs will end up costing more than the entire application,” he remarked. The cost of the ecosystem needed to maintain those APIs might be five to ten times that amount, he said. With some applications that have more APIs and a very large user base, costs for this “work-around” process can climb by a factor of hundreds.
Hackers Target APIs
Along with increasing costs, security is another issue with APIs. A recent Akamai report found that attacks against APIs accounted for 75 percent of hacking attempts against the financial services industry, adding risk as well as increased costs for security.
When data is spread out over multiple locations and systems as it is today, service applications are needed to pull data together to make sense of it, Platz said. Even a mid-size organization today has upward of 500 software and service applications, and many of these software service applications are housing duplicated data.
In an attempt to consolidate their data sources, companies are turning to data lakes: “An immense cost, and they create yet another attack surface.” In fact, he said, some of the biggest data breaches in the recent years have been in data lakes and data warehouses. The security doesn’t sit with the data, so the process of moving data from one repository to another bypasses the data security built into the application tier, he said. “If that immense store of data gets accessed, it has no way to defend itself. That’s a big problem.”
Data Governance
Another fundamental flaw with current Data Management technologies is that there’s no way to know inherently where data originated, who put it there, when it was put there, and whether it’s been tampered with, he said. As an increasing number of organizations rely on data to make critical decisions, machines and humans need to be able to validate that the information they are seeing has integrity. In areas such as healthcare, lives depend on Data Quality.
A New Way to Look at Data
An application-centric approach is focused on a function that is performed by a system, often framed as a “business process,” according to Dave McComb, author of Software Wasteland: How the Application-Centric Mindset is Hobbling our Enterprises. “This is a mindset. We are so immersed in it we don’t see it. We think this is normal. Until we see the problem, we will be blind to the solution.”
By allowing each process and each application to define their data structures, we have made it nearly impossible for any real sharing to occur, he said. Platz agrees:
“People should be thinking about a ‘data first’ approach.” Companies can continue to devise more complex and costly work-arounds to make an application-centric approach work in a data-centric world, or they can use a new approach that is data-centric.”
Data-Centric Architecture
McComb wrote that perhaps the most profound shift in Data Architecture is the recognition that a data-centric architecture has to take on a great deal of functionality that has historically been given over to applications, such as authentication, authorization, identity management, constraint management, query federation and much more. In a data-centric architecture it becomes the job of the architecture to handle these functions once, at an enterprise level, and not thousands of times inconsistently per application. Instead of finding more workarounds for an application stack, Platz suggests looking at small, lightweight applications that can be quickly built around a data stack.
Fluree
Brian Platz, co-CEO along with partner Flip Filipowski, founded Fluree in 2014. “There are a lot of great technologies out there that have evolved over time, but they also sit on a legacy that only allows them to do so much.” Platz and Filipowski decided to build the company with a data-centric perspective from the start, so they could approach issues such as rising costs and security vulnerabilities from a different angle.
According to the company’s website, Fluree is an immutable, time-ordered blockchain database, powered by an RDF-graph database engine. To prevent tampering, each block is an atomic update that is cryptographically signed and linked to the previous block in the chain. Fluree is also a modular data platform that doesn’t treat the database as a “black box,” but rather a highly-contextual web of data expressed in graph relationships. It can be broken up into component parts, each of which can be customized to specific application needs. Rather than being faced with putting up more servers in order to expand, users can scale components independently.
Security Lives with the Data
Platz believes data security should sit with the data. “We call this ‘data defending itself.’ Especially considering data is getting exposed to all kinds of different applications, we need to have the security co-resident with the data itself.” In a situation where five applications need to access the same data, for example, security rules have to be rebuilt five times—hopefully in the same way, he said. But when the data is able to defend itself right to the core layer, security rules can be implemented at the edge, at the application, “And that gives people the flexibility to configure those rules how they want.”
Time Travel
Fluree uses tamper-proof blockchain (SHA3-256) for extreme data integrity, providing complete data traceability throughout history, a provable audit trail into provenance and path of data, with digital signatures cryptographically tied to every change. An immutable ledger allows users to freely move forward and back in time with their queries. “We call this ‘time travel.’”
Because databases are changing by the millisecond, the ability to lock in a moment in time is critical to successfully achieving consistent and repeatable query results across multiple data sources. “People need reliability and consistency, and you can only do that if you can lock in moments in time,” he said. Governance-as-code provides control over Master Data Management, and one source of truth.
The Future is About Control
Platz sees a change in mindset coming, where companies realize they no longer need to send data to SaaS providers or other locations to get what they need from it. “If my data never leaves my purview, never leaves my site, I have complete control over it and how it interconnects with other data and other systems.”
Server-less capability using industry standard schemas, and based on semantic standards will provide the capability to move the application directly to the data. “I think we’ll look back at what we’re doing now and say, ‘Man that was crazy! Why did we ever do that?’”
Image used under license from Shutterstock.com