The word “database” may be one of the most misunderstood words in business computing. Several years ago, database meant a collection of data (or a base of data). As structured databases became popular, data became more and more organized. Structured databases were followed by the DBMS (database management system), which controls the data and how it is accessed. This was followed by relational database management systems (RDBMS). People can use these programs to enter data, store and protect it, and retrieve the data when needed. Examples include databases such as SQL Server, Sybase, Informix, Oracle Database, and MySQL.
A database management system is software that provides control over other programs and applications. Often, a DBMS will be called simply a database. Designing and implementing a good database is a significant challenge, requiring an analysis of the organization’s needs. The internet’s evolution, combined with constantly evolving requirements, have resulted in new types of databases with specific uses. Brian Bruffey, CEO of Protech, said:
“They need to use their data in a manner that lets them drive their strategy. Really focus on what they have, look at what the information is telling them, and let that help shape their strategy moving forward, so that they can really help their members. Along with that, you have to look hard at your policies and procedures and how that drives how you use your system.”
Upgrades to SQL
Historically, machine learning (ML) applications have used a collection of languages and a variety of complicated systems. Now, common ML functions can be directly accessed using the extremely popular SQL language. Today, with advances in SQL databases, machine learning workflows can be integrated with the strengths of an scalable SQL system running in parallel. Three things have changed to allow for these advances:
- Distributed Systems: SQL database systems have become distributed. This provides better parallelism and allows for the use of more cores. A distributed system can perform scaling by accessing several servers.
- New Hardware: CPU processors have become extremely sophisticated, GPUs are being installed in SQL systems, and single instruction, multiple data (SIMD) is now widely available.
- Code Generation: This helps to optimize queries and customize functions within a database by translating the original queries into machine code, in turn allowing databases to run much faster. (Code generation’s true power comes from producing machine code that is optimized for a specific query.)
For several years, graphics processing units (GPUs) have been used for machine learning tasks and intensive analytics. Generally speaking, these tasks were rare in the daily use of a relational database, and SQL Server administrators have had little motivation to become interested in GPU technology. Now, machine learning tasks can be launched by an SQL server using computing power of GPUs.
Consider the new MicrosoftML, which, after being configured for GPU use, will provide additional computing power without the need for additional coding. Additionally, artificial intelligence is impacting databases, with Microsoft using it for continuous monitoring of the Azure SQL database workload patterns. And, Oracle has recently released the Oracle Autonomous Database Cloud, which can automatically upgrade, repair, and tune itself, by way of machine learning.
The Multi-Database Model (& NoSQL)
The term multi-database system (MDBS) refers to a group of databases where each individual database (software program) is fully independent, and to the management system controlling them. Each database within the group has a separate schema. For example, a business might have one database for manufacturing and recording products, while a separate “sales” database records sales, and a third records finances, while a fourth database manages them. These four databases form a multi-database system.
An MDBS allows users to access data stored in these various individual, separate databases. In this kind of a system, independent “local” transactions are performed by the “local” databases, and “global” transactions are performed by the MDBS. Each of the local databases may use a different transaction management scheme and each local database has total control over all “its” transactions (both global and local) being executed on its site. This includes the ability to abort any of the transactions being performed on the site. (This feature can lead to communication breakdowns and a lack of coordination.)
It doesn’t get much acknowledgment, but during the last decade, a large number of organizations have begun using multi-database systems. In a survey performed by Scalegrid, 44.3 percent of the responding organizations reported using multiple databases. Of these organizations, 75.6 percent used a combination of SQL and NoSQL databases. This simple but intelligent combination of databases eclectically combines the strengths of both and allows an organization to maximize their use of data.
GPU Databases
GPUs can provide so much raw computing power, there is no need to worry about downsampling, indexing, or partitioning. With so much computing power, heavy indexing is unnecessary, and simpler data structures can be used. This means a GPU database does not have to work as hard for each new update. Complex queries can be performed immediately.
A fully functional GPU database (software program) must currently be built from the ground up. A GPU requires specific programming. Operations can be processed uniquely, to maximize the advantages of its threading model. For Big Data research purposes, GPUs can express data as an interactive visualization. A GPU database allows complex queries to be returned in milliseconds, all with the dataset growing, and more nodes being added.
A graphics processing unit is a highly specialized form of microprocessor, designed originally for quick image recognition. GPUs were developed as a response to the problem of image-intense applications that could overwhelm a CPU and degrade the computer’s performance. GPUs became a solution, allowing CPUs to offload overwhelming tasks to the GPU.
Modern graphics processing units are no longer limited to image recognition, but can now handle rapid mathematical calculations, for any number of purposes. From the user’s perspective, GPUs allow applications to run much, much faster. GPUs have become the hardware backbone for nearly all intensive computing applications, including driverless cars.
Driverless cars weren’t possible until researchers adopted the AI training tactic known as deep learning. The tactic relies heavily on powerful GPUs, sophisticated algorithms designed for deep neural networks, and access to vast amounts of data. Deep learning is a requirement for self-driving vehicles, because no program can anticipate every potential scenario a self-driving car could encounter. With deep learning tactics, the car’s AI can learn, adapt, and improve the car’s driving skills.
Machine Learning, Artificial Intelligence, and Databases
Organizations have begun installing database machine learning software to automate their Data Management process, and protect their customers’ data. Additionally, businesses are looking for ways to use AI and machine learning for streamlining and enhancing the user experience. Artificial intelligence (AI) databases have been recently developed as a way to optimize machine learning.
Some tech companies are already developing dedicated AI chips to streamline the heavy processing loads involved in machine learning. Two major challenges for machine learning models are the incredible amounts of data and processing power needed to train a neural network. The global AI chip market was assessed at $4,515 million during 2017, and is now projected to hit $91,185 million by 2025. The AI chip could easily revolutionize the ways computers learn.
Machine learning is both a part of artificial intelligence’s evolution, and a system of appropriate responses. ML is, in essence, the practice of using algorithms to learn, through repeated trial and error experiences. It provides a process for pattern recognition — ML software will remember 2 + 3 = 5, and the overall AI system will use that information in future predictions. Training is done by way of iteration, with massive amounts of data moving thousands of times through a multi-layered, weighted algorithm. In this situation, the AI controls the learning process, while the ML software provides the algorithms. Brian Bruffey, when asked about what organizations should be doing with their databases, commented:
“They need to keep it simple. Stay focused on what their members want and drive the areas where they are consuming this technology, and learn about this technology strategically. Focused learning, that protects them from getting overwhelmed. They have to invest the time to learn the features of the software. That will get them the most bang for their buck.”
Image used under license from Shutterstock.com