Advertisement

3 Best Practices for Utilizing Hadoop

By on

hadby Angela Guess

Scott Fleming and Paul Barsch of Teradata recently wrote in Forbes, “It makes sense to get excited about the possibilities afforded by Apache Hadoop YARN-based applications such as Spark, Storm, Presto and others to provide substantial business value. However, the actual tasks of managing and maintaining the environment should not get short shrift. Without considering best practices to ensure big data system performance and stability, business users will slowly lose faith and trust in Hadoop as a difference maker for the enterprise. With a goal of increasing big data application adoption, the Hadoop environment must run optimally to meet end-user expectations. These three best practices can help you improve operations.”

They continue, “Workload management is important in a Hadoop environment. Why? Because as your big data systems are more widely used for production, the needs of business teams will invariably drive competition among different components for system resources. Administrators can use YARN’s workload management capabilities to decide which users get what systems resources and when to meet service levels. When settings are properly identified and adjusted, jobs can schedule to gain maximum utilization of cluster resources. This not only keeps the Hadoop cluster’s footprint to an appropriate size, it also increases the adaptability to match resources to changing business requirements.”

Read more here.

Photo credit: Hadoop

Leave a Reply