by Angela Guess
Meta S. Brown recently wrote in Forbes, “The term ‘Big Data’ is relatively new, but the challenge it represents, realizing value from voluminous, complex and growing electronic data resources, has existed for decades. So has a meaningful, open standard for data analysis that was specifically developed for dealing with massive datasets; it’s called CRISP-DM. In a 2001 essay, 3D Data Management: Controlling Data Volume, Velocity, and Variety, Doug Laney laid out his now famous ‘Three Vs’ of Big Data. His clients were overwhelmed by the volume (sheer quantity), velocity (ongoing addition of new cases) and variety (diverse formats) of data at hand. Yet even before Laney’s article was published, an international consortium of over 200 organizations had already banded together (with funding from the European Union) to define and publish an open standard, CRISP-DM, for analysis of massive datasets.”
Brown goes on, “Although industry surveys indicate that it’s the most widely used analytics process standard, CRISP-DM is not particularly famous. It’s known by many hands-on analysts, but not the wider business community. Not every segment of the analytics community uses, or is even aware of, the standard. That’s a shame, because good process maximizes the chance of producing information that executives can actually use. What’s in this process standard? CRISP-DM lays out six major phases of the analytics process, with steps to be taken in each of them. It’s not a linear process; the phases represent an ongoing cycle of action and analysis, and there’s often a lot of back and forth within and between phases.”
Photo credit: Flickr/ KamiPhuc