Enterprises running large SAP HANA instances in the cloud are seeing a new challenge appear as their databases continue to grow. Since SAP HANA has a simplified data layout and structure compared to a more complex legacy database, it was assumed this would result in less data sprawl and duplication. But does the data stay […]
Modeling Modern Knowledge Graphs
In the buzzing world of data architectures, one term seems to unite some previously contending buzzy paradigms. That term is “knowledge graphs.” In this post, we will dive into the scope of knowledge graphs, which is maturing as we speak. First, let us look back. “Knowledge graph” is not a new term; see for yourself […]
Spark vs. Flink: Key Differences and How to Choose
Apache Spark is an open-source, distributed computing system that provides a fast and scalable framework for big data processing and analytics. The Spark architecture is designed to handle data processing tasks across large clusters of computers, offering fault tolerance, parallel processing, and in-memory data storage capabilities. Spark supports various programming languages, such as Python (via […]
Semantic Technology and Integration 101: What It Is and Why It Matters
New technologies like ChatGPT are all the rage, as they aim to answer questions and provide information that makes our lives easier. Yet, the validity of the results generated has come under scrutiny and, as a result, much emphasis has been made on how organizations can get relevant and trustworthy data into the hands of […]
Leveraging Data Stream Processing to Improve Real-Time Data Analysis
Data stream processing is rapidly emerging as a critical technology for modernizing enterprise applications and improving real-time data analysis for data-driven applications. As businesses become more reliant on real-time data analysis, data stream processing enables them to analyze and process large amounts of data in real time, providing timely insights and enabling informed decision-making. Traditionally, […]
Is Low-Code and No-Code Development the Solution to Your Productivity Dilemma?
Enterprises’ growing need for quality Data Management and productivity tools has led to an explosion of interest in emerging technologies, such as low-code and no-code platforms, to accelerate their digital transformation objectives. Businesses encumbered by legacy infrastructure must make significant investments and decide whether to buy their technology or build it internally. With both options, fast assembly […]
Save Money with Storage-as-a-Service During Uncertain Economic Times
Amid the economic uncertainty that has gripped 2023, IT leaders are scrambling to find ways to reduce costs. Enterprise storage is one of those areas that the IT team can look at for substantial cost savings without sacrificing availability, reliability, cyber resilience, or application performance. Taking a more strategic approach to your enterprise storage infrastructure will […]
Best Practices in Data Pipeline Test Automation
Data integration processes benefit from automated testing just like any other software. Yet finding a data pipeline project with a suitable set of automated tests is rare. Even when a project has many tests, they are often unstructured, do not communicate their purpose, and are hard to run. A characteristic of data pipeline development is the frequent […]
How to Work with Unstructured Data in Python
All our online actions generate data. Even if we don’t write posts, comment, or upload other content, we leave our traces by being silent observers. This leads to predictable results – according to Statista, the amount of data generated globally is expected to surpass 180 zettabytes in 2025. On the one hand, having many resources to make […]
2023: Mitigating Data Debt by Knowing or by Guessing?
One of the newer data buzzwords is “data debt.” Actually, it is approximately 10 years old, and it became popular ever since agile people realized that postponing things creates not only technical debt, but certainly also data debt. Will we, in 2023, be better at not creating so much data debt, and will it be […]