Click here to learn more about author Andrew Sohn.
The Internet of Things (IoT) is generating countless numbers of interactions between devices every day—in all kinds of different ways. But the consequences of these interactions aren’t about the Internet or the things at all. What IoT is all about is data.
The lifecycle of an IoT device is to collect data, execute some local processing of data, transmit data, possibly receive data, store data, and dispose of data. Along the way, the data needs to be protected from unauthorized access and comply with regulations. The data for each device must be combined with that of other devices.
This is a challenging task in the simplest technical environment. It’s hard enough for some companies just to consolidate data from Salesforce and Workday into a single Data Warehouse with quality data. Expand that a few million fold to understand the challenges of managing IoT data.
All of the growth behind IoT translates into more data volumes, more types of data, and a dramatic increase in the velocity of data. So it’s easy to see that IoT is becoming—and will continue to be—an increasingly significant source of what is commonly referred to as Big Data. In fact, the EMC estimates IoT and other sources will generate about 44 trillion gigabytes annually.
But, collecting this data by itself does not produce any value. The collected data must be combined with other data, put in context, and analyzed to create insights, action plans, or self-learning feedback loops.
This is why proper Information Governance (IG) is critical to IoT projects. These practices will allow the data in the IoT supply chain to create business value. Consider the fact that weather data is of little value until it’s correlated with location and other geographical data. Location tracking data is of little value unless it’s correlated with information about subjects and intent.
There are many maturing technologies and practices to help govern data once it is in the Data Lake. However, IDC predicts that, by 2018, up to 40 percent of IoT data will be stored, processed, analyzed, and acted upon before it gets there. Therefore, it is not sufficient to just manage IoT within the Data Lake. Data at the collection sites, edge nodes and other points within the IoT ecosystem must also be governed and protected. This is a gap that many organizations have not yet addressed.
The adage “garbage in, garbage out” applies here—no matter how hard we try to massage the data. In the last few years, I have worked on many projects with several different companies helping them ensure they can manage data in the Data Lake and make it usable for their consumers, who are often Data Scientists and Analysts. Most of the challenges come from the lack of usability and quality of the source data.
The more time and energy we put into managing the data at the point of collection and in the supply chain, the more benefits we will obtain. Governance needs to begin way before it gets to the Data Lake.