Data pipelines are like insurance. You only know they exist when something goes wrong. ETL processes are constantly toiling away behind the scenes, doing heavy lifting to connect the sources of data from the real world with the warehouses and lakes that make the data useful. Products like DBT and AirTran demonstrate the repeatability and […]
Choosing Tools for Data Pipeline Test Automation (Part 2)
In part one of this blog post, we described why there are many challenges for developers of data pipeline testing tools (complexities of technologies, large variety of data structures and formats, and the need to support diverse CI/CD pipelines). More than 15 distinct categories of test tools that pipeline developers need were described. Part two delves […]
Choosing Tools for Data Pipeline Test Automation (Part 1)
Those who want to design universal data pipelines and ETL testing tools face a tough challenge because of the vastness and variety of technologies: Each data pipeline platform embodies a unique philosophy, architectural design, and set of operations. Some platforms are centered around batch processing, while others are centered around real-time streaming. While the nuances […]
Data Observability: What It Is and Why It Matters
As a process, data observability is used by businesses working with massive amounts of data. Many large, modern organizations try to monitor their data using a variety of applications and tools. Unfortunately, few businesses develop the visibility necessary for a realistic overview. Data observability provides that overview, to eliminate data flow problems as quickly as […]
Best Practices in Data Pipeline Test Automation
Data integration processes benefit from automated testing just like any other software. Yet finding a data pipeline project with a suitable set of automated tests is rare. Even when a project has many tests, they are often unstructured, do not communicate their purpose, and are hard to run. A characteristic of data pipeline development is the frequent […]
Data Pipelines: An Overview
Just as vendors rely on U.S. mail or UPS to get their goods to customers, workers count on data pipelines to deliver the information they need to gain business insights and make decisions. This network of data channels, operating in the background, distributes processed data across computer systems, an essential framework and function for any data-driven […]
Why Data Quality Problems Plague Most Organizations (and What to Do About It)
For business leaders to make informed decisions, they need high-quality data. Unfortunately, most organizations – across all industries – have Data Quality problems that are directly impacting their company’s performance. Case in point: In a recent survey conducted by my company, practitioners were asked about the issues that plague their work, how much they trust their organization’s […]
DataOps Highlights the Need for Automated ETL Testing (Part 2)
Click to learn more about author Wayne Yaddow. DataOps, which focuses on automated tools throughout the ETL development cycle, responds to a huge challenge for data integration and ETL projects in general. ETL projects are increasingly based on agile processes and automated testing. ETL (i.e., extract, transform, load) projects are often devoid of automated testing. The […]
DataOps Highlights the Need for Automated ETL Testing (Part 1)
Click to learn more about author Wayne Yaddow. DataOps, which focuses on automated tools throughout the ETL development cycle, responds to a huge challenge for data integration and ETL projects in general. ETL projects are increasingly based on agile processes and automated testing. ETL (i.e., extract, transform, load) projects are often devoid of automated testing. The […]
Guide to Digital Transformation: Data-first Architecture
Click to learn more about author John Ottman. The goal of digital transformation remains the same as ever – to become more data-driven. We have learned how to gain a competitive advantage by capturing business events in data. Events are data snap-shots of complex activity sourced from the web, customer systems, ERP transactions, social media, […]