Hybrid databases have evolved in the last decade, with a focus on cloud environments. In 2013, Gartner created the term “Hybrid Transaction/Analytical Processing” (or HTAP), which is defined by Gartner as: “An emerging application architecture that ‘breaks the wall’ between transaction processing and analytics. It enables more informed and ‘in business real time’ decision making.”
The new HTAP architectures are designed to run transactional and analytical processes simultaneously, giving businesses more consistency in automated online decision-making, and software designers greater flexibility when creating and updating applications.
A quick review of the “old” hybrid database model has value, in that it provides an understanding of how the “new” hybrid databases works within the cloud. Early hybrid databases combined in-memory and on-disk data storage, taking the benefits from each technology.
A hybrid database merged both on-disk database features and in-memory database features to form a single unified engine. This combination provided high-performance data processing in the main memory and offered huge storage capacities on the physical disk. The benefits of a hybrid environment over in-memory and disk-resident databases were considerable, but faded as data storage became inexpensive.
In-memory databases are typically much faster than on-disk databases. The response time of data stored directly in RAM is very fast, and latency is quite low. RAM (where the work takes place) is non-relational and can be associated with NoSQL systems.
On-disk databases, alternatively, come with an immense storage capacity and provide inexpensive data storage. Unfortunately, their “performance” is clumsy and slow, as they are designed primarily for storage and data retrieval. Additionally, the storage design frequently uses large amounts of the CPU’s resources to optimize disk access patterns. On-disk storage can be associated relational (or SQL) storage systems.
Integrating Relational and NoSQL Databases
To function competitively, a modern business’s hybrid database must include use of the cloud in its design. It should be architecturally organized for both public and private cloud use. Combining relational and NoSQL databases supports this goal, and provides high-availability, scalability, and reliability. Both NoSQL and relational databases have their own strengths and weaknesses, and combining the two can maximize their advantages while minimizing their shortcomings.
In a relational database, the data is stored in the form of relations (organized in “tables”) and can use SQL, or another structured language. Relational databases are good at online analytical processing (or OLAP) and providing strong, consistent online transaction processing (or OLTP).
A NoSQL database, on the other hand, does not use tables for storing data, but uses a variety of more flexible, nonrelational models (key-value, graphs, document, etc.). NoSQL makes it easier for complex, distributed systems to access unstructured and structured data in the database. Many NoSQL databases work well with OLTP, with many data access patterns offering low-latency applications. The NoSQL search databases have been designed for analytics.
Scalability and Performance
Relational databases scale vertically, which basically means as the amounts of data increase, more storage capacity and processing power are sent to the single computer doing the work. Vertical scaling is clumsy and expensive.
NoSQL databases, on the other hand, scale horizontally, meaning as the amount of data increases, the system expands by adding more servers for computing power and data storage. This is a less expensive solution than vertical scaling.
The new hybrid platforms use scalable transactional processing, with no need to keep the whole database in-memory. This allows for the use of relational tables. Organizations can offer immediate decision-making capabilities using real-time analytics, while processing large volumes of data.
ACID (Atomicity, Consistency, Isolation, and Durability)
NoSQL databases generally do not maintain ACID properties very well (though some certainly do), primarily due to horizontal scaling. They use BASE (Basically Available, Soft state, Eventually consistent) principles, which are much more flexible than the design of relational databases. NoSQL is designed primarily for research on large amounts of data. But relational databases are designed to comply with ACID properties and, consequently, can provide this feature to a hybrid database being built.
Typically, an OLTP system uses row-oriented data stores (relational or SQL), to ensure the consistency and isolation needed to ACID properties. To maintain data integrity, OLTP databases must be ACID-compliant.
In OLTP, transactions are a sequence of steps coordinated to form a single unit of work. A transaction is successful only if the entire sequence of steps is successful. If a single part of the transaction breaks down, the whole transaction crashes. This feature ensures a customer’s money doesn’t disappear when it is supposed to be transferred to another account. If the money fails to go into the recipient’s account, the entire transaction will fail.
Flexibility
Relational databases are designed with static and pre-defined architectures, while NoSQL databases use a dynamic architecture focused on flexibility. Attempting to modify the architecture of an SQL database is difficult and often fails. NoSQL, on the other hand, can easily accommodate changes in its structure. This is why NoSQL databases are so popular in Agile environments. Relational databases can only handle structured data, while NoSQL databases can handle unstructured, semi-structured, and structured data.
The flexibility and speed offered by hybrid databases provides businesses with the ability to execute online analytical processing and online transaction processing, in parallel. This is called Hybrid Transactional and Analytical Processing (or HTAP). HTAP gives developers greater flexibility when updating existing software or creating new software. Modern hybrid databases are remarkably well-suited for real-time, data-driven apps.
The New Hybrid Database
Many organizations are simply no longer worried about data storage. As a result, new hybrid databases are currently doing more than just storage. They are instead focusing on deploying applications within hybrid cloud environments. As businesses move to the cloud, many of them are using hybrid cloud databases. While these are still “hybrids” in terms of data storage, they are, more importantly, also hybrids in the architectural sense, because they are combining public and private clouds.
Unfortunately, simply moving to a hybrid cloud in hope of reaping all the rewards is not enough. Today’s modern organizations need a modern hybrid database — one that’s always on, always available, always consistent, and ready to provide a seamless customer experience.
To provide these seamless experiences, most businesses need to rethink their infrastructures and create architectural solutions that support the needed performance and scalability. Generally speaking, this problem is resolved by using an in-memory computing platform (or IMC) to power the HTAP. It currently seems to be the most cost-effective strategy available for achieving the real-time scalability and performance these applications require.
This design enables analytical processing, using the same (in-memory) data storage needed to perform transaction processing. By removing the latency issues involved with shifting data from operational databases, to data storage for analytical processing, situation awareness and real-time analytics become a reality.
To get the most out of complex hybrid architectures, organizations need Data Management strategies that are compatible. With teams spread out across the world and users shifting between web, mobile, and desktop apps, organizations need to ensure that their database layer serves up consistent experiences. Otherwise, productivity can grind to a halt as customers and employees alike grow increasingly frustrated. Leaders responsible for developing Data Management strategies should:
- Organize projects by mapping them out with costs, skill requirements, deployment options, etc.
- While working in public clouds, avoid unnecessary costs for tools and features not currently needed.
- Investigate open-source-based offerings. This can reduce costs substantially if the necessary operational and development skills are available.
- Update standards and training and reexamine support organizations.
By developing and executing a hybrid Data Management strategy, the advantages of the cloud can be maximized. It is an investment in time and planning that will ensure customers have a positive online experience.
Image used under license from Shutterstock.com