Click to learn more about author William Peterson.
Existing generations of Data Management are point products or tools solving a specific issue. Conceptually this is similar to networking devices or compute being treated as a manageable resource. Terms like Data Fabric, while helpful, can be confusing as there are a number of competing definitions and uses of a Data Fabric. To me the simplest take is that there is a next phase of Data Management technology that will be used by organizations to deploy a Data Fabric – a Data Fabric that is specific to their organization and runs across their infrastructure. The next generation of Data Management will effectively handle the diversity of data types, data access, and ecosystem tools needed to manage data as an enterprise resource regardless of the underlying infrastructure and location.
The opportunity is to optimize the entire data lifecycle – from ingestion to processing – to enable applications that simultaneously require Real-time Analytics, Machine Learning, and AI. Organizations must be able to deliver complete flexibility in leveraging the underlying infrastructure (on-premises, Cloud, or containerized infrastructure) and deployment patterns (Hybrid or Multi-Cloud).
What are the critical capabilities for the next phase of Data Management? Nextgen Data Management must address how data is stored, accessed, distributed, and secured.
- How data is stored:
- Linear scalability without limits
- Architected to scale, performance, and consistency to simplify development and management
- Data and metadata are distributed to eliminate bottlenecks and points of failure
- How data is accessed:
- Mixed data access from multiple protocols to support broad access and eliminate data duplication and version issues
- Distributed multi-tenancy to support a wide range of applications and users without compromising on security or performance
- Global namespace to provide data visibility regardless of the actual physical location
- Integrated data streaming to support real-time and AI workloads
- How data is distributed:
- Distributed location support so data can be located on-premises, in a Cloud object store, and at the edge and optimized for costs, capacity, or compliance
- Location awareness so that management and job execution can be automated and optimized for performance, cost, and compliance
- Multi-master replication to support distributed operations with transactional integrity
- How data is secured:
- Ability to serve as a long-term system of record
- Data security and governance within the Data Management layer and not a function of the access method or type
What’s Next?
The next entries for this blog series will go into detail on the critical capabilities required for the next phase of Data Management and the opportunity to help answer pressing questions faced by an enterprise, including:
- Where does the data live?
- What do I know about the data I have?
- How do I get data to the apps?
- How do I securely provide data and use it?
- Am I able to use new tools and techniques easily?