Business insights are only as good as the accuracy of the data on which they are built. According to Gartner, data quality is important to organizations “in part because poor data quality costs organizations at least $12.9 million a year on average.” So, we believe that it stands to reason that providing access to the […]
SQL and the Relational Model: Enduring Standards in the Age of AI
In 1970, Ted Codd introduced the relational data model, which proposed representing data as tuples, grouped into relations, to allow for declarative methods to specify data. SQL was developed at IBM as a way to query relational databases. It is a declarative programming language, expressing what data is to be retrieved, as opposed to imperative programming languages […]
The RDBMS Split Process: A Practical Guide to Streamlining the Transition to Data Warehouses
In the first part of this series, we explored how harmonizing relational database management systems (RDBMS) with data warehouses (DWH) can drive scalability, efficiency, and advanced analytics. We discussed the importance of aligning these systems strategically to balance their unique strengths while avoiding unnecessary complexity. In this installment, we tackle a challenge many organizations face: […]
Modern OLAP: From Static Beginnings to a Big Data Renaissance
Online analytical processing (OLAP) enables users to interactively extract insights from complex datasets by querying and analyzing data in a multidimensional way. By structuring data by dimensions and measures, OLAP allows for intuitive and immediate slicing, dicing, and pivoting to interactively answer critical business questions. OLAP has come a long way since its inception. The […]
MySQL Replication: Unlocking Performance and Flexibility with Advanced Techniques
In database management, replication has long been a cornerstone of data reliability, redundancy, and performance. For those familiar with MySQL, replication may seem straightforward – simply read the binary log and apply it to a replica server, right? While this basic understanding is correct, improving replication performance is far more complex, particularly when dealing with […]
Enhancing Generative AI with Vector Databases: Practical Applications in the Travel Industry
As AI continues to drive innovations in customer experiences, the need for better data management systems has become more evident. One such system, vector databases, is gaining traction as a key enabler of generative AI in industries like travel. These databases are specifically designed to store and process high-dimensional data in the form of vectors, […]
Data Modeling in Machine Learning Pipelines: Best Practices Using SQL and NoSQL Databases
Data, undoubtedly, is one of the most significant components making up a machine learning (ML) workflow, and due to this, data management is one of the most important factors in sustaining ML pipelines. An appropriate data model allows the respective data to be accessible all day long, operate at peak efficiency, and be adjusted to […]
Mind the Gap: Architecting Santa’s List – The Naughty-Nice Database
You never know what’s going to happen when you click on a LinkedIn job posting button. I’m always on the lookout for interesting and impactful projects, and one in particular caught my attention: “Far North Enterprises, a global fabrication and distribution establishment, is looking to modernize a very old data environment.” I clicked the button […]
Bridging the Gap: Harmonizing RDBMS with Data Warehousing for Scalability
As businesses grow, so does the complexity of managing and analyzing data. Traditionally, relational database management systems (RDBMS) have been the backbone of data storage, offering robust and reliable transactional capabilities. However, as data volumes increase, traditional RDBMS solutions start to hit their limits, causing performance issues that affect overall operations. The need to scale […]
The Post-Modern Data Stack: Unleash the Power of Foundational Models
Imagine you are assigned to extract sales insights from your data. Along with troves of corporate financials together with other market trends, you are also given access to hours of audio and video files of actual sales representatives speaking with customers. How do you process this in Spark? Or, consider another scenario where you work […]