Data integration processes benefit from automated testing just like any other software. Yet finding a data pipeline project with a suitable set of automated tests is rare. Even when a project has many tests, they are often unstructured, do not communicate their purpose, and are hard to run. A characteristic of data pipeline development is the frequent […]
How to Work with Unstructured Data in Python
All our online actions generate data. Even if we don’t write posts, comment, or upload other content, we leave our traces by being silent observers. This leads to predictable results – according to Statista, the amount of data generated globally is expected to surpass 180 zettabytes in 2025. On the one hand, having many resources to make […]
2023: Mitigating Data Debt by Knowing or by Guessing?
One of the newer data buzzwords is “data debt.” Actually, it is approximately 10 years old, and it became popular ever since agile people realized that postponing things creates not only technical debt, but certainly also data debt. Will we, in 2023, be better at not creating so much data debt, and will it be […]
How to Select the Right Database
In today’s data-driven world, technologies are changing very rapidly, and databases are no exception to this. The current database market offers hundreds of databases, all of them varying in data models, usage, performance, concurrency, scalability, security, and the amount of supplier support provided. Choosing a database is a different class of challenge. Selecting the right […]
What to Expect from Open-Source Data Infrastructure in 2023
Open-source technologies will become even more prominent within enterprises’ data architecture over the coming year, driven by the stark budgetary advantages combined with some of the newest enterprise-friendly capabilities added to several solutions. Here are three predictions for the open-source data infrastructure space in 2023: 1. Economic headwinds will make open-source data technologies even more attractive to […]
Will 2023 Be the Year of the Polyglot Persistence?
Meet polyglot persistence. It’s not a new term, but one that’s catching fire in what many call the unsexiest part of Data Management – Database Management. It also ties to the physical part of Data Management, which is storage management, often overlooked by data teams. Storage management and database management/administration indeed used to be a separate […]
What to Expect in 2023: AI and Graph Technology
2023 will bring exciting advances in AI and graph technology. One of the most compelling innovations will be the ability for quantum programs to be turned into graphs and vice versa. Natural language understanding will become part of AI models. The adoption of standards-based semantic layers will spike as they enable data selection through business terms. Graph […]
The Open Data Stack Distilled into Four Core Tools
In this article, we are going to explore core open-source tools that are needed for any company to become data-driven. We’ll cover integration, transformation, orchestration, analytics, and machine learning tools as a starter guide to the latest open data stack. Let’s start with the modern data stack. Have you heard of it or where the […]
Is Your Database Built for Streaming Data?
When it comes to data sources, analytic apps developers are facing new and increasingly complex challenges, such as having to deal with higher demand from event data and streaming sources. Here in the early stages of this “stream revolution,” developers are building modern analytics applications that use continuously delivered real-time data. Yet while streams are clearly the […]
It’s All About Relations!
The new ISO 39075 Graph Query Language Standard is to hit the data streets in late 2023 (?). Then what? If graph databases are standardized pretty soon, what will happen to SQL? They will very likely stay around for a long time. Not simply because legacy SQL has a tremendous inertia, but because relational database paradigms […]