by Angela Guess
Laurent Bride recently wrote in the Talend blog, “There are many Data Processing Engines/Frameworks out there, some are fully open source like Apache Spark, Apache Flink, Apache Apex while others are packaged and available as a service such as Google Dataflow. Most Apache open source projects combine streaming and batch data processing, and provide various levels of APIs to help programmatically develop pipelines or data flows. Google is helping to lead this charge with an abstraction layer that allows Dataflow SDK-defined pipelines to run on different runtime environments.”
Bride goes on, “A little over a year ago, Google open sourced its Dataflow SDK, which provides a programming model used to express Data processing pipelines (Input/source -> Transformation/Enrichment -> output/target) very easily. What is great about this SDK is the level of abstraction it provides so you can think of your pipeline as a simple flow without worrying too much about the underlying complexity of the distributed and parallel data processing steps required to execute your flow.”
Bride continues, “Talend has a long history with the Apache Software Foundation (and already has committers on key Enterprise Apache projects such as Apache ActiveMQ, Camel, CXF, Karaf, Syncope or Falcon) and has been focusing a lot on developer productivity. Given this, as Google announced its proposal for Dataflow to become an Apache Software Foundation incubator project, it became very natural for Talend to join with them to help accelerate development along with a few other companies that share similar interests and core values.”
photo credit: Google