Event Streaming is one of the most important software technologies in the current computing era as it enables systems to process huge volumes of data in blazingly high speed and in real-time. It is indeed the “glue” that can connect data to flow through disparate systems and pipelines that are typical in cloud environments. Leveraging on the Pub/Sub pattern for the message flow, and designed with the cloud in mind, Apache Pulsar has emerged as a powerful distributed messaging and event streaming platform in recent years. With its flexible and decoupled messaging style, it can integrate and work well with many other modern-day libraries and frameworks.
In this session, we will build a modern, efficient streaming data pipeline using Apache Pulsar and Apache Cassandra. Apache Pulsar will handle the data ingest. The external data that comes in will be used for further processing by Pulsar Functions that will in turn reference the tables in Cassandra as the data lookup sources. The results of the transformed data will then be egressed and sent to an Astra DB sink. We will also examine to see how we can further optimize the entire processing pipeline.