In 2025, preventing risks from both cyber criminals and AI use will be top mandates for most CIOs. Ransomware in particular continues to vex enterprises, and unstructured data is a vast, largely unprotected asset. AI solutions have moved from experimental to mainstream, with all the major tech companies and cloud providers making significant investments in building turnkey GenAI and AI solutions for enterprise customers. Chief experience officers want to leverage AI, yet not at the cost of damaging customer relationships, reputation, and market share with irresponsible use. IT professionals responsible for data and infrastructure need to be prepared as employees start sending company data to AI. The following predictions focus on the urgency to get AI data governance right – from systems and policies to IT skills.
Systematic data ingestion for AI will be the first data storage mandate
AI mania is overwhelming, but so far, enterprise participation has been largely led by employees who use GenAI tools to assist with daily tasks such as writing, research, and basic analysis. AI model training has been primarily the responsibility of specialists, and storage IT has not been involved with AI. But this will change swiftly in the coming year. Business leaders know that if they get left behind in the AI Gold Rush, they may lose market share, customers, and relevance. Corporate data will be used with AI for RAG and inferencing, which will constitute 90% of AI investment over time. Everyone touching data and infrastructure will need to step up to the plate as everyday employees start sending company data to AI. Storage IT will need to create systems for users to search across corporate data stores, curate the right data, check for sensitive data, and move data to AI with audit reporting. Storage managers will need to get clear on the requirements to support their business and IT counterparts.
Unstructured data governance processes for AI will mature
Protecting corporate data from leakage and misuse, and preventing unwanted, erroneous results of AI are top-of-mind for executives today. A lack of agreed-upon standards, guidelines, and regulations in North America is making the task more difficult. IT leaders can start by using data management technology for visibility on all their unstructured data across storage. This visibility is the starting point to better understanding this growing volume of data so that it can be governed and managed properly for AI. Data classification is another key step in AI data governance and involves enriching file metadata with tags for sensitive data identification that cannot be used in AI programs. Metadata enrichment is also available to help researchers and data scientists quickly curate datasets for their projects by searching keywords that identify file contents. With automated processes for data classification, IT can create workflows to continually send protected datasets to secure locations and, separately, send AI-ready datasets to object storage where they are ingested by AI tools. Automated data workflow orchestration tools will be important for efficiently managing these tasks across petabyte-scale data estates. AI-ready unstructured data management solutions will also deliver a means to monitor workflows in progress and audit outcomes for risk.
Role of storage administrator evolves to embrace security and AI data governance
Pressing demands on both the data security and AI fronts are changing the roles of storage IT professionals. The job of managing storage has evolved, with technologies now more automated and self-healing, cloud-based, and easier to manage. At the same time, there is increasing overlap and interdependency between cybersecurity, data privacy, storage, and AI. Storage pros will need to make data easily accessible and classified for AI while working across functions to create data governance programs that combat ransomware and prevent the misuse of corporate data in AI. Storage teams will need to know where sensitive data lurks and have tools to develop auditable data workflows that prevent sensitive data leakage.
Ransomware defense of unstructured data becomes more urgent
Traditionally, data protection has focused on mission-critical data because this is the data that needs faster restores. Yet the landscape has changed, with unstructured data growing to encompass 90% of all data generated in the last 10 years. The large surface area of petabytes of unstructured data, coupled with its widespread use and rapid growth, makes it highly vulnerable to ransomware attacks. Cyber-criminals can use the unstructured data as a Trojan horse to infect the enterprise. Cost-effectively protecting unstructured data from ransomware will become a critical defense tactic, starting with moving the cold, inactive data to immutable object storage where it cannot be modified.
Unstructured data management solutions broaden to serve AI data governance and monitoring needs
My company’s 2024 State of Unstructured Data Management report revealed that IT leaders are prioritizing AI data governance and security as the top future capability for solutions. AI data governance covers protecting data from breaches or misuse, maintaining compliance with industry regulations, managing data biases, and ensuring that AI does not lead to false, misleading, or libelous results. Monitoring and alerting for capacity issues or anomalies, last year’s top pick remains high again along with analytics and reporting. IT and storage directors will look for unstructured data management solutions that offer automated capabilities to protect, segment, and audit sensitive and internal data use in AI – a use case that is bound to expand as AI matures.