by Angela Guess
David Loshin recently wrote in Search Data Management, “Many companies are struggling to manage the massive amounts of data they collect. Whereas in the past they may have used a data warehouse platform, such conventional architectures can fall short for dealing with data originating from numerous internal and external sources and often varying in structure and types of content. But new technologies have emerged to offer help — most prominently, Hadoop, a distributed processing framework designed to address the volume and complexity of big data environments involving a mix of structured, unstructured and semi-structured data.”
Loshin goes on, “Part of Hadoop’s allure is that it consists of a variety of open source software components and associated tools for capturing, processing, managing and analyzing data. But, as addressed in a previous article in this series, in order to help users take advantage of the framework, many vendors offer commercial Hadoop distributions that provide performance and functionality enhancements over the base Apache open source technology and bundle the software with maintenance and support services. As the next step, let’s take a look at how a Hadoop distribution could benefit your organization.”
He continues, “Hadoop runs in clusters of commodity servers and typically is used to support data analysis and not for online transaction processing applications. Several increasingly common analytics use cases map nicely to its distributed data processing and parallel computation model.”
photo credit: Hadoop