Advertisement

Big Data Governance: Harnessing the Horses

By on

Click to learn more about author Ian Rowlands.

A friend sent me a note the other day celebrating the year of the Rooster. One website (I make no claim for its authority and authenticity!) quotes Chinese Astrologer Laura Lau as saying “Rooster years are rarely ever boring, so it’ll be a year with a lot of action … “The Rooster favors those who put in the hard work and stick to a plan.” So, no change for Data Managers then.

Actually, as I look around this year, for Data Managers it’s not so much a matter of a plan and hard work, and more a question of how to handle the action. In fact, it feels like that circus act where the glamourous rider is trying to ride two horses at once. Only the two horses are very different … a carthorse and a thoroughbred.

The carthorse, of course, is “traditional” data. The data that operational systems generate and process by and feed, often, into data warehouses. The thoroughbred is Big Data, feeding new style analytics and the subject of all kinds of fascinating new technologies. The challenge of riding the two horses at once is that that the one is already in harness, and the other is running wild – or at least is trying to.

In an interesting blog post in 2010, Winston Chen (now founder and CEO of “Voice Dream”, a company doing great work producing  “voice-based mobile apps for people who prefer to listen”) wrote an interest blog tracing Data Governance back to the 1960’s. A more general view is that the formal discipline emerged with the new millennium. Whichever view you take, governance of traditional data is well established.

By contrast, Big Data Governance is, at best, an emerging discipline. It’s fair enough, of course. After all, the Big Data phenomenon hasn’t been with us for very long. However, there is a bit more to it than that. The technology is still exploding. Businesses are still defining use cases. So what would Governance mean? And once the processes are understood, where does the necessary supporting data come from? Fortunately, it turns out that many of the basic processes required are much the same. Policy Management, Role, Incident and Issue Management are consistent. Data lineage is still required – though it becomes more complex. There are new requirements, however. At a purely technical level, governance of taxonomy and tags becomes important – but there are bigger concerns.

Moving Big Data solutions from the ideation, experimentation and validation stages into productive implementation is a process that needs management. Self-Service Data Preparation and Analytics demand careful governance of accessibility to data.

What this leads to is the need for Data Governance of three kinds. Harnessing the carthorse and the thoroughbred together requires three sets of capabilities – traditional Data Governance, Big Data Governance, and — as you productize Big Data solutions – a fully integrated enterprise solution. That’s the challenge for the year of the Rooster. Make sure all the bases are covered.

Leave a Reply