by Angela Guess
According to a new release out of the company, “Databricks, the company founded by the team that created Apache® Spark™, today announced that Apache Spark 2.0 is generally available on its just-in-time data platform, making it the first vendor to offer Apache Spark 2.0 support. With major contributions from Databricks and the Spark community, this is the first major release of open source Spark since Spark 1.6 in 2015. Databricks customers can now immediately benefit from Spark 2.0’s three core attributes — easier, faster, and smarter. ‘Since the release of Spark 1.0, we’ve spent countless hours listening to members of the Spark community and Databricks users to learn from a mix of praises and complaints. Spark 2.0 builds on what the community has learned, doubling down on what users love and improving on what users lament,’ said Databricks’ Chief Architect and Cofounder, Reynold Xin.”
The release continues, “Among other major improvements as outlined in the Databricks blog post, the most notable features of Apache Spark 2.0 are: (1) Speed: Gaining huge performance in orders of 5 to 10 times faster than Spark 1.6 for some Spark operators due to Tungsten’s Phase 2 whole-stage-code generation and Catalyst’s code optimization; (2) Simplicity: Unifying developer APIs across Spark’s libraries such as DataFrames and Datasets; (3) Structured Streaming: Laying the foundation for continuous applications by providing high-level declarative streaming APIs based on DataFrames and Datasets built atop Spark SQL engine that works on real-time data; (4) Machine Learning Model Persistence: Saving and loading pipelines and models across all programming languages supported by Spark.”
Read more at Marketwired.
Photo credit: Databricks