“To be successful, a solution to both Metadata Management and Data Governance should be integrated,” said Christian Bremeau, CEO and President of Meta Integration Technology. “Metadata Management starts by supporting virtually any tools that your IT is using.” Bremeau presented at the DATAVERSITY® Enterprise Data World 2016 Conference with his colleague, John Friedrich, Vice President of Partnerships, also from Meta Integration Technology.
Bremeau began by presenting a slide showing the attributes of a successful solution and then expanded on those attributes:
- Integrated Solution
Bremeau compared the process of cobbling together a system out of disparate parts to building a car from the ground up. Buying a Data Lineage tool assuming that, “‘I’m going to make it work and integrate it with a Business Glossary here, and integrate it with another tool, and another tool, and another tool.’ It’s like trying to build your own car,” he said, by buying brakes at one store, buying an engine somewhere else, and trying to put together your own solution. “Metadata Management and Data Governance have to work hand-in-hand.”
- Support all End-Users
“Business users and technical users have to work together on the same system, so while it’s very important to target the business users,” colleagues who might be able to provide the best definitions for your glossary, perhaps they’re not the best choice “to give you a data model or to produce the ETL behind that,” he said, but it’s very important that they are also able to work together.
- Architectural Evolution
“We all worked many years in the Enterprise Data Warehouse and that’s a classic architecture. Today we have a different world – data lakes, with a lot of data stores,” said Bremeau, and there are many options to choose from. “It’s not really one replacing the other. It’s very important that they both work together very well.”
- Support for Multiple Tools
“The last part is to make sure you can support any of your data stores, data integration, BI tools, today and tomorrow, because things keep changing,” he said. “We see a lot of people on traditional BI and suddenly, they’re using Tableau, and it doesn’t mean rewriting everything in Tableau, [but] you still need to trace lineage” to your business reports as well as Tableau. The entire system should be supported.
The Big Picture: Metadata Management for Data Governance
Bremeau illustrated what a successfully integrated – but simplified – big picture model would look like, using a classic Enterprise Architecture for Data Warehousing. “If there is one thing to learn and master in any Metadata Management and Data Governance solution, it’s known as ‘the big picture.’”
“There are a couple of data stores on one side, could be files, and then I have some ETL tools that are bringing everything into the Data Warehouse,” with the BI tools on the other side.
“Now this is very simplified because most of the customers that we deal with do a lot of staging areas, before. And they’re not using one ETL but three or four different types of ETLs, and some hand-written SQL scripts, and you have to deal with all that. That’s the reality, if you truly want to know the lineage of what’s going on in the enterprise.”
And the other side of the picture doesn’t necessarily go straight to a BI tool, he said, “You might have some data marts and multiple [other] BI tools around.”
Bremeau says he expects any Metadata Management software today to be able to connect to live databases, data integration servers, and BI servers as well. “My advice, in general, is always to start from the end – from the business [intention] side – and that’s what people hate to do.” He says he prefers to start with the business users because, “That’s basically going to get them excited, if you can start from their Business Intelligence reports,” he said. “If you’re buried inside your ETL, and work for weeks, you will still have nothing to show” to your business users.
“At the center of this, you’re going to go to your Data Warehouse and bring everything in,” which, Bremeau said, is not as simple as it sounds. When the data comes in – no matter what products or tools are used,
“There is already a lot of data movement and SQL parsing going on, because the reality is you might have a great set of tables, but on the right of it, you’re going to have a lot of views, and these views have a ton of SQL that we parse.”
Inside the database the ETL may write into some places and invoke procedures that will update some tables – “I’ve not yet touched the ETL server and the BI server – just pointing at this, and I already have some complex lineage of what’s going on,” he said. Then you need to take care of “what is known as Metadata harvesting,” which is not a simple task, and at the end you connect the dots with a step called ‘Metadata stitching.’
He said that the people he works with on Metadata Management are often surprised at how many things they discover that are wrong in their systems that they hadn’t seen before. “Just the act of Metadata stitching to build this Enterprise Architecture is going to tell you if you are connected right.”
“You need to not only be able to trace lineage, but also go back in time,” Bremeau said, because a colleague may come to you and say “Somebody made a really bad decision in Q4, 2005. How did that happen?” And you will be able to show them that the decision came from a specific configuration at that time and that’s why that number was so wrong.
“You also need to be able to go into the future, because these people are going to say they’re going to change the Data Warehouse, and after you change the Data Warehouse, you want to understand what’s going to be broken on the left and the right.”
He again stressed that lineage and impact analysis should be presented to business and technical users in different ways. “Business users have no clue what a Data Warehouse is, or ETL, and they shouldn’t, actually.” It’s important to be able to tell them what a certain field is in a report and where it comes from, but save the technical details for the BI and ETL people. “So your software needs to provide both the detail, the technical lineage, and the simplified business lineage.”
Three “Killer” Metadata Management and Data Governance Apps
Data Modeler/Data Documenter
Bremeau then discussed what he called “Killer Applications’” designed “to make your life easy.” These three apps are designed to connect the dots between various parts of the Enterprise Architecture. “One that is really popular these days is known as a data documenter,” he said. Although it’s not referred to as a Data Modeler, that’s what it is. “You have a Business Glossary, and your mindset is all about re-use, so you’re re-using the same definitions that you’ve already preserved in your Business Glossary.”
With “a very nice model of a Data Warehouse, we can – as a single drag-and-drop – reverse engineer that entire Data Warehouse into a well-defined Business Glossary,” a process that can even be done on a tablet.
“These data documenter tools are really like Data Modeling tools, on one side you’ve got the Business Glossary you create on the fly, you re-use on the fly, and at the bottom, you’ve got a live system and it maintains itself automatically. Because you have no choice, you just deployed a new application, guess what, they changed the database underneath.”
Data Mapper
“Maintaining semantic mapping by hand is a nightmare. What you want is a set of tools to do that automatically,” he said. A lot of companies work with subject matter experts who want to write mapping requirements or have self-service data, but “They don’t use any ETL – they’re not expert in ETL – they don’t want to know anything about it.” So you need graphical tools that can do data mapping at high scale.
“Active Data Governance is when you proactively have a set of tools to bridge the gap from your requirements, from your Data Governance, from your glossary, your mappings, your models – to generate something that’s going to save a lot of time for the ETL people and the BI people.”
BI Report Documenter
A lot of companies produce BI reports, “But you have to find a way to document those reports,” he said. There are tools and technologies today where you can see what the report is in your Metadata Management solution, and by simple drag-and-drop you can connect to your Business Glossary and re-use the definition. “We do even better for tools like Tableau. We automatically find everything in your reports that Tableau is using and we generate an “‘extract’ tab, which is a business glossary” for this report, he said. So if someone in finance is printing a report, they have “a nice Business Glossary right inside the report and these are not just copies – these are live links,” he said, so if somebody has just changed a definition in the glossary, for example, that Business Glossary is automatically updated in the BI tools.
“Whatever the technology does, the mission of a Metadata Management solution is to go to the absolute source of wherever it’s coming from to the end on the other side.” And understand that “there are sometimes five or six technologies and transformations that are going in between.” According to Bremeau, it’s all about working together:
“The message is about doing Metadata Management and Data Governance in an integrated way with the biggest toolset you can have between your Business Glossary at the top and your IT on the other side. These tools have to go both ways, not only do you need to understand and validate what’s going on in IT lineage, but active Data Governance is the generation of BI, the generation of ETL, and the generation of anything you can on the other side.”
Here is the video of the Enterprise Data World 2016 Presentation:
Register for the Enterprise Data World 2017 Conference Today (in Atlanta, Georgia)