Click to learn more about author Ori Rafael.
With 2019 nearly over, do you know where data streaming is headed? We recently spoke to hundreds of organizations about this very question, wanting to understand how they’re dealing with their data streaming issues – from storage and analysis to governance and more. Their answers reveal new trends, directions, and hints of what organizations might achieve with their data streaming in the months and years to come.
1. New Data Stream Users Come from Off the Beaten Path
Forget the usual suspects. Industry interest in data streaming has gone beyond companies who work with high-volume, rapidly changing data – those involved in real-time bidding, mobile monetization, and other ad-tech concerns, as well as those who work with IoT sensors.
Today, as interest grows in Data Science and advanced analytics, and as SaaS and web and mobile applications continue to flourish, the number and range of companies using data streaming has expanded.
These days, what mid- or large-sized company isn’t planning or already involved in a data streaming project, motivated by a desire to analyze customer information, extract insights from e-commerce purchases, track real-time changes in the stock market, optimize content via clickstream records, and more?
2. Storage and Compute Go Their Separate Ways
Is the enterprise data warehouse a thing of the past? Is the trend today toward dividing the database into distinct components? Maybe so. Today’s data teams are increasingly leaning toward distributed computing and solutions that enable them to take advantage of inexpensive storage that’s independent of compute resources, including cloud object storage offered by Amazon Web Services, Microsoft, or Google. This model is often better suited for a world where even a 10-person startup could generate petabyte-scale data, necessitating the ability to store high volumes of data without racking up bills for database storage.
Streaming data and big data are often synonymous – streaming sources such as IoT, logs and click-stream are the usual culprits when it comes to generating massive volumes of data at high velocity. Hence, we will see streaming architecture becoming increasingly decoupled and on cloud data lakes.
3. A Schema-free Approach is the Shortest Path to more Agile Data Science
Data analysts of yesteryear using traditional approaches would probably not uncover insights that today’s data scientists – using machine learning and neural networks – can. A data scientist facing a new dataset cannot always predict the queries she will have, resulting in a critical need to store data in its original condition, without foisting schema on the data or making changes or alterations as the data is ingested into storage.
Data streaming and Data Science go together, in terms of predictive forecasting and predictive decision-making based on extensive analysis of event-based information. It would be no surprise if more organizations enhance data science agility and effectiveness through a schema-free approach.
4. Data Management and Data Governance are Now Critical Paths to Success
Streaming data is semi-structured at best and understanding what data you actually have is not as simple as looking at the table headers. The complexity of streaming data means significantly more emphasis needs to be placed on Metadata Management, Data Governance, and ETL testing, compared to traditional relational databases.
The need to manage semi-structured and frequently changing data from multiple streaming sources poses new challenges that require distinct technological solutions in order to manage metadata and improve visibility into streaming data as it is ingested, processed, and structured for further analysis.
5. Data Manipulation without Direct ROI is Unsustainable
With organizations are investing more and more into big data projects, the question has to be asked: How does one gain a clear ROI? Companies have come to realize that putting limitless resources into “data plumbing” is no longer viable. In fact, organizations need to move from data engineering, which focuses on practical applications of data collection and analysis, to extracting insights.
Thus, there’s an increasing trend and clamor for tools that empower data analysts and data scientists to elicit business insights, rather than simply write ETL jobs –tools that include storage, stream processing, and orchestration solutions, as well as transformation, analysis, and data visualization products.
6. More Users are Beating a Path to the Data Streaming Door
The data streaming world used to be populated by a very few – the big data engineers and data scientists who were the only ones knowledgeable about the extremely complex tools and processes required to work with data streams.
On the other hand, BI was peopled by business analysts – experts in running SQL queries and working with relational databases. However, as business processes increasingly depend on data streaming, business users – not just the techies – expect to acquire the capabilities to work and manipulate such data the same way they do other datasets. Organizations need to make the skills to perform this task accessible to all relevant business users and not just the Data Science staff.
7. Changing Paths: From Batch to Streaming Architectures
Today’s businesses want answers to their queries – and they want them now. They’re not going to accept limited resources and technology as excuses. That’s why batch processing – the way data streaming delivers answers to an analytical question within 12- to 24-hour time frames – is on its way out.
Understand that many insights, such as in IoT analytics, are short-lived and require immediate attention. To satisfy today’s impatient business consumers, data teams must deliver insights at or near real-time. This can only be possible by supplanting batch processing with streaming architecture.