Click to learn more about author Mark Cassetta.
Structured data — things like credit card information and social security numbers — is easily tracked and analyzed. Unstructured data, on the other hand, is much more difficult to identify and use. This data includes information buried deep within documents users create and consume every day, including business plans, presentations, reports and emails. Unstructured data often includes details about employees, customers, business practices and trends that are sometimes written in prose or as notes. Unsurprisingly, the amount of unstructured data continues to grow exponentially within most businesses. In fact, analysts at Gartner estimate that more than 80 percent of enterprise data is unstructured. And while this data is chock full of value, businesses generate so much of it many don’t even know where it resides or what type of information all those files and folders contain.
One of the biggest obstacles organizations face in unlocking the value contained in unstructured data is the lack of a well-defined information handling strategy. They have no means to accurately identify their valuable data as employees create, use and share it in their day-to-day work.
According to Forrester, many firms are still struggling under the weight of their data. The two biggest challenges? Finding ways to integrate new technologies into their existing business processes that can source the data for analysis and having the basic ability to source, gather, manage and govern the data as it grows.
With all this data generation, structure is critical. Organizations must develop and deploy reliable processes to help them understand what unstructured data they’ve got, where it is stored and how to evaluate its sensitivity level. They also need tools to help ensure it is categorized accordingly for ease of future analysis — and to ensure it is adequately safeguarded.
By putting structure around data, we can reduce the noise and delete unimportant information, whether for governance, security or business analysis.
But How?
Step 1: Start small. Don’t let the massive amount of data across your organization become overwhelming. Think of a single question related to your business, something you would like to learn more about or a trend you would like to predict. Choose an important aspect of your business that generates a good amount of data. Pick something that employees are familiar with and a type of data that is managed in a consistent manner across the business already. And then start to categorize it throughout your normal workflows
Take into consideration all the places data can exist or reside within your business. Does the data contain information that needs to remain within the walls of your organization? Does it contain personal information subject to one of the many emerging privacy regulations? Is this data IP your organization wants to share with suppliers but not competitors? Do you want to use this data to enhance an artificial intelligence (AI) strategy your organization is about to embark on?
An example might be to track a formula or recipe that is core to your business. For example, say you are a paint manufacturer. Each type of paint has its own formula or code that is used in all sorts of documents, from product plans to sales receipts. The formulas might be included in emails, Word documents and spreadsheets. And typically, these formulas would be packaged or written out consistently across your organization, even when they are mentioned in the middle of a paragraph buried in a Word doc. This type of data would be very straightforward to begin tracking as it moves through your business workflows.
Work with executives and business leaders to draft policies for how this data should be handled. How sensitive is it? Is it general business information? Or is it confidential? Can it be shared outside of the organization? Or only with a select few employees? Where should it be stored?
Step 2: Be realistic with how you educate employees on how to handle data. If you have accepted that managing and protecting data can be a competitive differentiator, realize that providing employees with quarterly or annual training about data handling and protection sets unrealistic expectations for making employees part of the solution. To educate people, you need to train at the point when they interact with the data. We are bombarded with data on a daily basis, so expecting someone to remember how to handle data while fighting fires sets you and your team up for failure.
Do an internal audit to find where every example of this type of information resides and add metadata and labels to help you glean value in future analytics projects. Implement rules and best practices for what to do when the data is in motion.
If your employees use cloud-based collaboration tools to work on documents together, determine who takes responsibility for applying categorization and labels and how. For example, a document that one person creates may be shared with someone else who adds a lot of sensitive information. Digital categorization tools can assist the document creator in setting rules for what kinds of data can and can’t be added.
The Role of the IT Pro
The IT department has a significant role to play in helping an organization bring structure to its unstructured data. It used to be that IT folks needed to have the ability to learn coding and develop apps to help manage IT systems. Now they must understand the role of data as it relates to business.
In short, they must be able to think beyond code, beyond the technologies themselves, to build a methodology around unstructured data to mine its value. Today IT needs to collaborate more with business leaders to help give insight into data.
In addition, IT pros need to understand how artificial intelligence and machine learning come into play in relation to unstructured data. Typical machine learning algorithms are based on regular expressions that are simple and fail to take into consideration the full context of the data they encounter. For example, while these algorithms might be able to easily identify a credit card number in a Word document, a description of an upcoming doctor’s appointment in an email exchange might be missed.
IT pros can leverage data categorization tools that enable users to tag emails, PPTs and documents with metadata to enable artificial intelligence and machine learning tools to access that information and make use of it. When applied this way, machine learning can provide insights that previously required an analyst to see facts, trends and causal relationships.
Keep Expanding!
Once your team becomes proficient at your new information handling processes and you begin to understand how putting some structure around your unstructured data can really pay off, scale into other areas of your business. You’ll probably begin to see patterns and have ideas for new trends to look for. The value of other types of unstructured data will emerge that you’ll want to follow up on.
Really, we are just at the beginning of a new era of data exploration. Think creatively. It’s an exciting time.