“Context: Better understanding something by taking into account the things around it,” said Jeff Jonas during his Wednesday afternoon keynote at the 23rd annual Enterprise Data World (EDW) 2019 Conference (co-presented with DAMA International) from March 17-22, 2019 at the Sheraton Boston Hotel in Boston, MA. Context is the key that allows data professionals, and our business colleagues, to understand the world around us.
Whether that world is working to discover the flow of a specific data element within a data lineage system, creating an effective multi-cloud architecture plan, writing a machine learning algorithm that can answer the question “does Rogaine cause weight gain in mice?”, mapping out policies with your Data Governance Steering Committee, or mixing five different puzzles together for your kids (who didn’t know it at the time) to better explain the complexity of Data Quality and entity resolution to the EDW audience, context is key.
Look at the title of this article: How do gorillas, puzzles, courtrooms, and hats all combine – along with 55 bags of chocolate, 700+ ice cream bars, 1,000 attendees from 30+ countries, 30 separate topic tracks, 190+ sessions, 50 sponsors/exhibitors, and a whole bunch of laughs – to create a successful data conference? It takes context, along with the many metaphors discussed throughout the conference, to make the complex and ever-developing world of Data Management more comprehensible. Though coffee certainly helps too.
An apt metaphor, which Laura Sebastian-Coleman used during her Wednesday afternoon keynote, is one of a ship. While discussing editing the newest iteration of the Data Management Body of Knowledge (DMBOK2), she compared a couple of different metaphors. “Sports metaphors don’t really get at the complexity of Data Management,” she remarked. But, that of a ship and sailors do. Data Management (and editing “The Bok,” to use the proper vernacular) takes multiple people doing various functions across a broad range of skill sets.
We need ship builders, loggers, sailors, a captain and first mate, doctors, cooks, marines, and everyone else – from the planners of the ship to those who finally take the ship apart and retire it when it’s done – to make the entire system work. “Data Management professionals are all on the ship together and complexity can be reduced if we all know the mission and work collectively.”
This metaphor could be expanded to the business professionals that we work with on a daily basis as well. Without a concerted effort, that ship may well just sink or not catch the wind or get invaded by pirates. The same is true of Metadata Management initiatives, developing a data catalog, implementing AI into your BI platform, learning to be an agile data modeler, and selling the necessity of Data Stewardship to a C-level executive in your company.
We employ both context and metaphors to demystify our data lakes, data warehouses, data swamps, data vaults, and data lifecycles, though we also “get lost in metaphors,” warned Barry Doyle during his Monday evening lightning talk. As we look back at EDW 2019 and forward into the next year, let’s discuss some of the most useful (or at least amusing) metaphors that can help us gain context in a world of increasing complexity – and try not to get lost in that “dark and stormy night” of bad data and even worse metaphors.
The 800-Pound Gorilla
“Data Science is not about Data Science anymore,” commented Michael Stonebraker during his Tuesday opening keynote. “It’s about data prep.” He was discussing Big Data and the problem that data scientists, and for that matter all data professionals, are dealing with now in terms of data volumes.
The three Vs of Big Data are a well-known acronym today, and some people have added a fourth, fifth, or even sixth V to the equation. But Stonebraker kept it at Volume, Velocity, and Variety – that is enough to explain where we are now with data. “We will discuss this in 37 minutes and 27 seconds,” he joked while watching his timer, and then proceeded to discuss the key issue many organizations are dealing with around Big Data.
Data scientists spend 90 percent of their time finding and cleaning data, then 90 percent of that leftover 10 percent checking the cleaning, explained Stonebraker. So, in essence, they spend only about 1 percent of their time actually working with that clean data to analyze the results. “You can’t analyze dirty data,” he exclaimed. So, what is that 800-pound gorilla in the boardroom, cubicle, data center, BI platform, and myriad spreadsheets filling our organizations? It’s not Volume. Data professionals have adequately dealt with that V. It’s not Velocity. While real-time analytics is still working to better deal with Big Velocity, it’s not the biggest gorilla, it’s more of an annoying monkey that keeps stealing your bananas. It therefore must be Variety (and thus complexity) of data.
He discussed a company that has 75 procurement databases and the issue with trying to get the best price for staples: It’s virtually impossible to have those 75 databases, along with all the rest of an organization’s vast Data Architecture, adequately integrated around the world. What’s coming next? “Machine learning will become omnipresent,” he said. It is necessary to deal with so much complexity, to provide proper context (aka good data at scale), and to “get the best price for a box of staples.”
Context is key and good quality data allows that context to actually give reliable results.
Puzzles of Puzzles – So Many Pieces
To provide a more in-depth experience, the conference has long built in multiple days (two at the beginning and one at the end) filled with half-day workshops, tutorials, and intensive seminars. These offerings provide deep dives into particular topics, such as April Reeve’s Architecting for Enterprise Data Management and Governance, Prashanth H. Southekal’s Introduction to Machine Learning, Frank Cerwin’s Lessons from the Front Line of MDM Crusades, Theodore S. Hills’ Data Modeling for NoSQL and SQL, Donna Burbank’s Data Strategy Bootcamp, Karen Lopez’s Data Modeling for Data Protection: Security, Privacy, and Compliance, and so many more.
Sessions like these help take all of those pieces in the DMBOK and put them into a clearer perspective, so that doing ‘one after another after another’ failed Data Governance initiative is no longer necessary; it will work this time.
As Anthony Algmin said during the conference orientation: “The more we do with data, the more we realize we are changing behavior. It’s not about the data – it’s about the people.” And that “people piece” is one of the biggest puzzles of them all.
During his keynote, Jeff Jonas highlighted this same theme clearly, with much delight from the audience. He discussed how context accumulation is “the incremental process of integrating new observations with previous observations.” So, to present the difficulty in dealing with the quality of data, human behavior, and “an ever-growing pile of puzzle pieces of varying sizes, shapes, and colors,” he did an experiment on his family – likely much to their chagrin, though perhaps they are used to it by now.
He took five different puzzles and added pieces from each puzzle into one box, then had them start putting the puzzle together.
After 47 minutes they started to figure out the ruse, then after 2 hours and 18 minutes they proclaimed, “We think you threw in a few random pieces.” He was using this example to discuss context as it relates to entity resolution, in terms of finding cheaters at a Las Vegas casino (among other examples). But, it highlights a bigger premise here: Data Management is a complex amalgam of discovery vs. confusion, ingenuity vs. tradition, cost vs. reward, and a limitless range of other factors.
Jeff’s family likes to put puzzles together, they do it often, but he just changed the entire equation by adding five puzzles of different sizes. The same is true for Data Governance initiatives, building better machine learning algorithms to deal with Big Variety, leveraging the DMBOK effectively within an enterprise, and deciding if you need a third ice cream bar during the Wednesday afternoon ice cream social (likely that answer was yes).
Courtrooms and Hats
No conference would be complete without fancy hats and lots of laughs – especially when used judiciously to prove an important point. First of all, EDW 2019 had some really cool hats.
Mark R. Horseman wins the Fanciest Hat Award during his lightning talk titled Data Governance in Pure Imagination, where he sang his elevator speech example to the tune of a Willy Wonka & the Chocolate Factory song. We didn’t know he had such a great singing voice – maybe next year he will sing his entire presentation?
The best DATAVERSITY Hat Fashionista Award goes to Elsa Kristin Gudbergsdottir from Össur in Iceland. She actually requested to wear our giveaway hat with her conference photo. Thank you, Elsa! We are glad everyone loved the hats.
The best Powdered Wigs Award (which we didn’t know existed until this year), goes to Thomas Redman, Len Silverston, and Danette McGilvray during their session: You Are Liable for Not Establishing a “Common Vocabulary” in Your Organization!
They employed a courtroom-style format, where the defendant (Laura Sebastian-Coleman) was accused of not establishing a valid “common vocabulary” in her organization. The jury was the audience, and everyone had a rousing good time with the entire session, but it also connects with our greater themes: context and discovery. A fictitious company (VVF) had a data breach where 500,000 people had their data exposed. The primary reason (according to the plaintiff’s attorney) that VVF failed to act properly and let affected people know about the data breach was that it did not have its data defined consistently across the enterprise.
They had some systems where the full names of individuals were in a single field, others in multiple fields – and that is only one of many examples. “It was difficult to get a comprehensive list of impacted parties,” claimed the plaintiff’s attorney, Mr. Silverston. VVF lacked the ability, due to bad Data Quality stemming from having no common vocabulary, to track critical identification numbers and other information. They lacked context and had far too much complexity to deal with reliably.
Awards and Conclusions
During the Thursday closing keynote, MC John Ladley asked the panel of Danette McGilvray, Asha Saxena, and Anthony Algmin a number of questions – both his own and from the audience. Towards the end of the discussion, he brought up the topic of data literacy and selling the value of data to executives (and, by extension, the entire business).
Anthony pointed out that the real point is to start off with: “How can I help you?” It’s not about selling the idea of Data Management and the necessity of data, but rather demonstrating what data can do to help a given executive, business user, or any other stakeholder. Asha then followed up with: “Data is so critical, it shouldn’t be work.” As data professionals, it’s our job to decrease risk and increase value, so that actually “dealing with data” from a user perspective is as easy as click –> knowledge –> action.
We need to make sure that context is available, so that everyone consistently understands what is going on. And, tying all that into changing behavior, Danette really closed everything off best when she said: “When you break bread with people, everything changes. It’s a human activity.” It’s about changing behavior.
And thus, more than anything, Enterprise Data World is about breaking bread with our compatriots, along with learning new skills and ideas to take back with us, working on keeping our given ships afloat, wearing fancy hats, contemplating gorillas and puzzles, eating lots of chocolate, and appreciating the joy of data metaphors.
In closing, we’d like to congratulate all the 2019 DAMA-I Excellence Award winners:
- Stephanie Bruno: For the work she and her team have done at the Elizabeth Glazer Pediatric AIDS Foundation to transform their data warehouse using Data Governance principles to insure Data Quality and consistency.
- Stephanie Contre: And her team at the Analytics Centre of Excellence in the City of Edmonton, for their work in providing data to the citizens of Edmonton as useable Information for more strategic, community-based decision-making.
- Martha Dember: For her work in Data Governance and teaching many of our current data experts.
- DAMA Japan: For their achievement in translating the DMBOK2 to Japanese which took over 12,000 hours and is now in print.
Also, congratulations to the new CDMPs: In total, there were 96 associate level, 9 practitioner level, and 2 master level credentials awarded during the conference.
We’d also like to thank the staff at the Sheraton Hotel Boston for helping everything to run successfully, as well as a special thanks to Hannah Sanford at Hannah Sanford Studios for her insightful “graphic facilitations” completed during many of the presentations and displayed around the conference hallways. And a big thank you goes to John Hydo and his team from Tomorrow’s Event Productions for helping everything flow so smoothly, and to the esteemed conference photographer Jeff Kempe at Jeff Kempe Photography for taking such fantastic photos throughout the conference.
See you next year for the Enterprise Data World in San Diego, CA from March 22-27, 2020.