Advertisement

Data Retention Policies Must Evolve to Address Emerging Technologies and Data Growth

By on
Read more about author Fredrik Forslund.

The emergence of new technologies, including AI, IoT, and blockchain, in addition to the widespread embrace of digital transformation, has driven a dramatic increase in data. The reliance on data analytics to drive data-driven decision-making also requires large volumes of data for meaningful insights.

While AI and generative AI (GenAI) tools and systems contribute to the massive growth in corporate data, these systems do require large amounts of current and quality data to “feed” large language models (LLMs).

Some companies are just beginning to dabble in GenAI, adopting it to automate business processes and create a more efficient workplace, but many have already deployed it. According to Gartner’s 2024 Budget Priorities for CFOs report, 81% of respondents plan to spend more on GenAI this year.

What does this all mean? The emergence of AI and the widespread trends toward digital transformation and data-driven decision-making have created the perfect storm – too much data combined with outdated data retention and data lifecycle policies that haven’t kept pace.

Reassessing Outdated Data Lifecycle and Retention Strategies

Enterprises must reassess, rethink, and modernize their data retention policies to not only secure corporate data but to maintain compliance with the expanding ecosystem of global data privacy regulations.

With the sheer volume of data sources extending across multiple assets and different systems in a variety of destinations, the complexity is unprecedented. As a result, organizations must reevaluate their data governance framework across the data life cycle, including data classification (e.g., automatic flagging of data and its duration), data protection, hygiene, and data destruction.

Once the drivers for the increase in data within the organization are identified, enterprises should take the following steps before diving into updating data retention policies:

  • Perform a data audit: Companies must first identify all sources where data is generated, stored, or processed. The task of determining where all data is stored is easier than it sounds. It may be stored in databases, file servers, cloud storage, employees’ devices, and third-party applications, among others. Many of today’s organizations take a hybrid approach to data storage that uses on-premise, cloud, and endpoint storage to meet their needs. For example, sensitive or critical data may be stored on-premise for enhanced security, while less sensitive data or archival data may be stored in the cloud. While storing data on employee endpoints may offer more convenience, it can also be vulnerable to data loss or theft if devices are lost, stolen, or compromised. Implementing endpoint security measures such as encryption, data backup, and the ability to perform remote data sanitization is critical to protecting corporate data. Finally, it is also important to document the findings of the audit, including any vulnerabilities or risks identified. This documentation will serve as a basis for developing remediation plans and improving data storage practices.
  • Classify data: Once the audit is complete, data should be classified based on its sensitivity and importance to the organization and put into “buckets” that include personal identifiable information (PII), financial data, intellectual property, or sensitive business information. This step should include determining how old the data is, separating the “good” data from the redundant, obsolete, and trivial (ROT) data, and flagging any data that is no longer needed for immediate destruction via data sanitization.
  • Mitigate risks from cloned data: The duplication of data from on-prem to the cloud is a common problem for businesses. Not only does this increase the data stored (and costs for storage), it also increases the risk of data theft due to a breach. The first step to managing the clone issue is to identify which data has been duplicated and where it is located. An investigation into data location can provide a better understanding of how it was cloned and who may have unauthorized access to it. Cloned data may also include sensitive information and/or PII. Using safeguards such as encryption can protect data whether it is in transit or “at rest.” Finally, cloned data should be securely erased from all storage locations and backups.

Data Retention Policies for the AI Era

As organizations embrace AI, they must also align data lifecycle and retention policies. Once the audit, classification, and investigation phases are complete, companies will be better positioned to update retention policies to reflect today’s IT environment. One of the key aspects of a policy fitting for 2024 is scalability to accommodate the increasing volumes of data required for training and deploying LLMs. To reiterate, for better AI outcomes (and ROI), data must be accurate, current, and complete.

Today’s effective data retention programs also require effective lifecycle management and governance processes which prioritize clear policies and procedures for retention that identify retention periods, data sanitization processes, as well as accountability and reporting measures. 

Finally, it can’t be said enough that with more data comes more risks related to data privacy, compliance, and security. Organizations must implement robust measures to protect retained data from unauthorized access, breaches, and misuse. In addition to employing encryption, access controls, and anonymization techniques, conducting consistent security audits can be effective at safeguarding sensitive information and remaining in compliance with the growing number of global privacy regulations, many modeled on the comprehensive GDPR.

Reassessing any policy takes time and legwork. Undertaking the painstaking process of establishing new data lifecycle processes and data retention policies will enable companies to build a solid foundation for leveraging data more effectively in the AI age while mitigating risks and ensuring compliance.