Can Automation Save Big Data?

Click to learn more about author Amar Arsikere.

“What would you think if I sang out of tune? Would you stand up and walk out on me?”

The Beatles song everybody knows could also pass as the wilting rallying cry for Big Data, whose “tune” does seem out of sorts these days as more people question the framework’s future.

When Hadoop first came to town, it was enabling the multi-machine processing needs of web giants like Google and Yahoo!. Hadoop allowed them to harness the infinite creative power of data and cheap computing. Today, Hadoop and other Big Data technologies power Yahoo!, Google, Facebook, Twitter and many of the other big names in tech we all know and love, or hate, or prefer not to say. Many other organizations (and their generous venture capitalists) shared in the dream of mimicking Yahoo!’s success with Hadoop, which soon became the compute and storage platform of choice. The problem? These companies weren’t Yahoo! or Google or Facebook etc. with their armies of extremely talented engineers. For these companies, like most companies, Big Data has simply proved to be too hard.

Even Cloudera revealed during a recent quarterly earnings call that they were spending too much money and effort on getting their customers deployed. This is a universal problem not just for enterprises that let it all ride on Hadoop but on the future of Big Data. Projects that do deploy to production take months and are extremely inefficient, and to properly support a Big Data project, two data engineers are typically needed for every Data Scientist. But such talent is infamously scarce—the latest Harvey Nash / KPMG CIO Survey showed Big Data and Analytics remain “the number one skill in short supply” for the fourth year in a row—and not only is stored data growing in volume, it’s becoming increasingly complex with more variety, sources, environments and users of data being added every day. And while Cloud-based Big Data solutions like Azure, Google Cloud Platform and AWS provide some relief to the complexity issue and require fewer specialists to get up and running, they still require too much engineering expertise.

But, while most of the core Big Data platforms will probably never break out of the “for developers only” mold, there is a growing group of vendors building on top of these technologies that can be called friends. Especially promising among these are those on the vanguard of Big Data automation that hide the underlying complexity of Big Data and make it accessible for more organizations. Generally, the idea here is to create software that uses extremely sophisticated algorithms to automate much of the manual coding and configuration required to make Big Data work, replacing labor and time intensive processes with automated ones. Depending on where the automation is applied, it can mean faster development, faster processing, faster analysis, and in general, faster time to vale for the use of Big Data. This in turn translates into reduction of development time, greater operational efficiency, higher scalability, and faster and more cutting-edge and data-driven innovation. Big Data can break out from being limited to just the elite few to being available to the vast majority of organizations through automation that eliminates the complexity that has prevented Big Data from becoming mainstream.

This next generation of value added providers are helping enterprises overcome Hadoop’s complexity, as well as the complexity of Hadoop follow-on technologies like Spark and upcoming “server-less” distributed Cloud solutions. All of these next generation Big Data enablement providers are building on top of multiple Big Data “operating systems” and allow for faster and more successful production implementation of Big Data projects.

For example, companies like Infoworks.io manage the development and operations of the entire data engineering process by automating away the complexity of creating and managing data pipelines from ingestion to consumption. Waterline Data applies automation to data cataloging while Unravel Data combines automation with troubleshooting and optimization of Big Data systems. These are just a handful of the many exciting new companies bringing automation, machine learning, artificial intelligence and general innovation to the Big Data fold. While each focuses on different pieces of a very large puzzle, together they are pushing Big Data out of the mud, rendering it easier, faster, more cost-effective, and ultimately more useful for more organizations.

As we look to the future, the Big Data universe will continue to attract venture capital funding focused on new innovation that will close the complexity gap. And while the Big Data ecosystem continues to evolve and expand, expect to hear more about the new vanguard of companies that make Big Data accessible for the vast majority of businesses as they help Big Data get by with a little help from its friends.

TAKE OUR DATA MANAGEMENT CERTIFICATION PREP COURSES

Data Topics

Leave a Reply Cancel reply