Generally speaking, databases are storage systems with various built-in software features to manage the movement of data within. A database management system (DBMS) describes software controlling other programs and applications. The creation and implementation of a well-designed database is a serious challenge. With the evolution of data volumes and constantly changing needs, new types of databases with specific uses are needed.
Most traditional platforms use a relational SQL system of storage. In various in-memory systems data storage takes place in the “memory,” and can be lost if the computer loses power. Long term data is saved in “storage,” and is retained regardless of the computer’s on/off status.
With the dramatic growth of data use, data access can become a bottleneck, limiting and restricting the performance of computer systems. This is particularly true of data storage combined with transactional data, which has presented some interesting problems. Scaling several “write” transactions at the same time is difficult.
Ricardo Jimenez-Peris, founder and CEO of LeanXcale — and a researcher focused on scalable databases for over 25 years — came up with a solution. LeanXcale has created an SQL database that is ultra-scalable and supports full ACID (atomic, consistent, isolated, durable) transactions. Describing its beginnings during a recent DATAVERSITY® interview, Jimenez-Peris said:
“I was a researcher for many years, focused on the problem of how to scale transactional management data. I tried everything, and eventually I saw that I couldn’t go any further. Then one day, I was helping a friend who was working on storage and couldn’t figure out why it couldn’t be scaled. The conversation that I had with my friend provided an insight, and I decided to trash everything that I had done in the past. I just took a fresh, white page, and started from scratch.”
He then spent something like nine months, “just like a pregnancy,” he said, working on the problem. He wanted to scale up transactional management, and he tried to scale out each of the ACID properties independently. The problem had been around since the early years of SQL databases and no one had yet to find a good solution. “When I finally found a solution,” he commented. “Another friend, with business experience, said, ‘This is really very good; you should create a startup.’ And that was how the whole thing started.”
Transactional Databases
The basic purpose of a transactional database is to record transactions. For example, a customer’s purchase, a hotel reservation, or the clicks on a web page are each recorded as transactions. Each transaction is normally assigned its own unique transaction identity number, or “trans ID,” along with a list of items that make up the transaction.
The items purchased during the transaction are one example of items on a transaction list. Transactional databases may include additional tables containing more information associated with the transactions (item descriptions, the salesperson’s ID, the geographic area, and more).
Transactional databases are designed to be ACID compliant, he said. A transactional database is a Database Management System that can cancel, or undo, a transaction or operation that is not completed appropriately. This assures data sent to the database either succeeds or fails as a transaction. This feature acts as a screening process, providing a high degree of data integrity in the database.
Incomplete data is screened out. This feature has been available for several decades and has recently been adapted to accommodate most relational database systems. This combination, however, presents scalability problems. Jimenez-Peris remarked that:
“So there has been an issue that has been there for a few decades — scaling out operational databases. People didn’t know how to do it because they didn’t know how to scale out transactional management, ACID properties. This is something I’ve devoted my entire scientific career to solving. We can now scale out linearly to hundreds or even thousands of amounts. Basically, we have removed the bottleneck that has been always there.”
Storage Engines
A storage engine (or a storage system) is a component within a database that is responsible for storing, managing, and retrieving data from the memory and from storage. The database, as a whole, can answer complex questions. Storage engines, on the other hand, view the data from a more restricted perspective, offering a simple API or exposing GET, DELETE, and PUT operations. The database can be described as an application built atop the storage engine, offering a schema, a query language, indexes, and transactions.
Some databases come with plug-in storage engines, meaning the same database can be used with multiple storage engines. Additionally, there are storage engines that were developed independently from the databases using them (RocksDB1 and WiredTiger2), and which power databases using completely different query languages. Jimenez-Peris described his team’s efforts in developing a new storage engine:
“We started working on our own storage engine. It’s called KiVi, and I decided not to go to market until we had that up and running. So, we built our own storage architecture and it’s drastically different from anything you’re familiar with. There’s the internal architecture we’re putting in — it’s the data system that we use — and we have invented a new concurrency control. This storage engine took a lot of time and energy. We spent three years making it a reality, but now we have it, and it allows us to be a leader in the market.”
KiVi is a radically new distributed storage engine design that has been created to run efficiently using current many-core and multi-core NUMA architectures. It was designed to handle both range queries and random updates efficiently, thanks to its novel data structures.
KiVi was created, in part, to overcome the problems of using HBase. It uses a row-columnar storage model, which provides columnar acceleration to analytical queries.
The storage engine can move any part of a table across different servers without impacting the incoming load’s quality of service. This ability provides true elasticity, and allows the number of nodes within a cluster to be discovered easily, while minimizing the number of servers used.
Jimenez-Peris added:
“So basically, if you look at what we have built, it’s something that combines an analytic warehouse, a key-value data store, and a data lake with a full ACID transaction management and we can manage everything at scale.”
Database Scaling
Scalability describes the ability of a computer system to handle increasing amounts of data. A scalable system can increase its workload, evolving and adjusting with resources designed to work with changing amounts of data, without requiring downtime.
Both scalability and elasticity can help in improving availability and performance as demands change, particularly when the changes are unpredictable. Elasticity is about how easily a database can adapt to increasing workloads by providing resources in an on-demand manner. Elasticity allows the available resources to match the system’s current needs as closely as possible. Typically, relational databases are very inelastic, because they use a predefined model.
Scalability also requires “availability,” and a DBMS with the flexibility to accept administrative changes, upgrades, and maintenance without affecting applications or end user accessibility. A scalable system can be adjusted to increasing workloads, without affecting its accessibility. Jimenez-Peris said:
“LeanXcale has a distributed architecture that lets it scale out all its layers, like transactions, storage, and query processing. All the layers can scale to hundreds of nodes, allowing them to reach several millions of updates per second. The cost might be a place to hold up the process, but it’s not the big thing. Because we wanted to position ourselves to be a competitive technology, we made our prices comparable to what you pay for a pencil or something.”
They are now working with a growing contingent of organizations that really need a new and innovative way to deal with so much data. Their traction in the market is growing and people are starting to find out more about possible use cases for such a solution.
Image used under license from Shutterstock.com