Suppose you’re in charge of maintaining a large set of data pipelines from cloud storage or streaming data into a data warehouse. How can you ensure that your data meets expectations after every transformation? That’s where data quality testing comes in. Data testing uses a set of rules to check if the data conforms to […]
Data Observability vs. Monitoring vs. Testing
Companies are spending a lot of money on data and analytics capabilities, creating more and more data products for people inside and outside the company. These products rely on a tangle of data pipelines, each a choreography of software executions transporting data from one place to another. As these pipelines become more complex, it’s important […]
Observability: Traceability for Distributed Systems
Have you ever waited for that one expensive parcel that shows “shipped,” but you have no clue where it is? The tracking history stopped updating five days ago, and you have almost lost hope. But wait, 11 days later, you have it at your doorstep. You wished the traceability could have been better to relieve […]
Data Pipelines: An Overview
Just as vendors rely on U.S. mail or UPS to get their goods to customers, workers count on data pipelines to deliver the information they need to gain business insights and make decisions. This network of data channels, operating in the background, distributes processed data across computer systems, an essential framework and function for any data-driven […]
Are Data Warehouses Still Relevant?
Over the past few years, enterprise data architectures have evolved significantly to accommodate the changing data requirements of modern businesses. The emergence of advanced data storage technologies, such as cloud computing, data hubs, and data lakes, makes us question the role of traditional data warehouses in modern data architecture. Data warehouses were first introduced in the […]
Data Lineage and Data Quality: How They Intersect
The intersection of data lineage and Data Quality helps provide more accurate and useful information. Data Quality represents the accuracy of data. Internet businesses need good Data Quality to operate efficiently. Unfortunately, there can be obstacles in gathering, storing, and maintaining high-quality data. The use of data lineage can help eliminate those Data Quality obstacles by providing […]
How Pre-Built Connectors Can Save You Time in Data Integration
The hallmark of man’s invention, the wheel completely changed our lives. From a simple cart to electric vehicles – all can be attributed to the invention of the wheel. What does it have to do with data integration? Nothing really, but it’s a good analogy to see how things have evolved in the realm of […]
The Data Engineer’s Roadmap
Data engineering is a fascinating and fulfilling career – you are at the helm of every business operation that requires data, and as long as users generate data, businesses will always need data engineers. In other words, job security is guaranteed. But, with such great power comes great responsibility. The journey to becoming a successful data engineer […]
Data Management Is Dead – Data Empowerment Has Emerged
In my last DATAVERSITY article, “The Machine Economy Is Here – The Digital Transformation Era Is Over,” I discussed the end of digital transformation, the arrival of the machine economy, and the emergence of data empowerment. In this article, I follow up by laying out the problems with traditional Data Management and why data empowerment is now […]
What Is Data Observability?
Data observability is the practice of monitoring and analyzing the health of an organization’s data and data systems. Essentially, it gives you a 360o overview of what’s happening with your data at any given point in time. This practice is beneficial, as it provides all stakeholders with an in-depth insight into how their data is collected, […]