“The need to merge real-time analytical processing with on-the-spot transactional decision-making is a problem common to all industries,” noted Monte Zweben, Splice Machine’s cofounder and CEO in a recent DATAVERSITY® interview. Intelligent predictive applications are trained to learn and to adapt, rather than simply being programmed with a limited series of responses. Predictive applications can process large amounts of information in real-time. They can also use Machine Learning and Artificial Intelligence to learn from previous experiences and choose the best response.
Artificial Intelligence is used to make software applications adaptive, and Machine Learning is applied to train an application’s decision-making skills based on the data they process. The training process normally begins with Data Scientists choosing an appropriate representation of feature data and an appropriate model, said Zweben. Model selection is an important step, as different models have different strengths. Repetitive training supports behaviors such as algorithm selection, parameter tuning, and feature engineering.
Intelligent Analytics and Big Data
Splice Machine has developed a new Online Predictive Processing Platform™ (OLPP) that will run both in the Cloud and on-premises. It is a novel platform designed for both Analytics and being a system of record. It weaves intelligent applications into a company’s day-to-day operations – which isn’t always true of other Big Data platforms. Its software makes Predictive Analytics, useful in real-time operations, and at Big Data volumes. It is a scale-out SQL Relational Database Management System, Data Warehouse, and Machine Learning platform, all combined into one system.
The software is open source and built by combining the popular Apache Hadoop, HBase, and Spark distributed platforms. Companies in financial services, healthcare, energy, manufacturing, and retail use can use the platform to increase their operational efficiency, deliver superior service, and eliminate unnecessary costs. This innovative platform allows applications to make predictions by learning from past experiences and acting on those predictions in real time.
The Splice Machine platform is specifically designed to support predictive applications: “Splice Machine provides a single integrated platform that can provide the analytical processing needed to gain insights about your business,” noted Zweben.
A History with AI
Monte Zweben is a computer scientist by training and started his career as an Artificial Intelligence researcher, co-running a laboratory at NASA Ames. Later, he helped to form a company called Blue Martini Software, which was one of the first Machine Learning companies. Monte felt frustrated by the limitations of data platforms, and with the realization that Big Data didn’t have to be “just for” Analytics with the idea of using a single integrated platform for Analytics and transactional applications, the concept of Splice Machine was born.
HBase
Splice Machine is a SQL relational database which runs on top of HBase that offers transactional processing and, with Spark, also offers deep level Analytics.
HBase is a NoSQL database that has traditionally worked on top of Hadoop. It merges the scalability of Hadoop with real-time access. Unlike the standard relational database system, HBase does not natively support SQL and is not a relational data store. Its applications are usually written in Java. Describing their plans for HBase, Zweben stated:
“We don’t want to get bogged down in HBase SQL battles. Instead, we are pivoting to aim higher in the food chain. We want Data Scientists to build predictive applications that run on this database.
Apache Zeppelin Notebook
The integrated interface works with the Apache Zeppelin Notebook. This strategy offers a convenient, low-risk approach for organizations investing in the data science. Zeppelin Notebooks are similar to text documents but use code that makes the documents active that can produce useful functionality, tables, reports, and graphs.
The Apache Zeppelin Notebook is a completely open source and web-based tool that enables interactive Data Analytics. This browser-based notebook helps Data Scientists, analysts, and engineers become more efficient and productive. It does this by assisting with development, organization, and execution. Apache Zeppelin supports data exploration and visualization tools and works with Spark. It also supports Python, SparkSQL, Scala, Hive, and others. Zweben commented:
“Apache Zeppelin is a great notebook technology that allows you to have little snippets of code and multiple programming languages that present results, and that you can visualize.”
Spark
Spark is one of the systems in the platform. Spark provides a large number of Machine Learning algorithms for Data Scientists. Splice Machine also includes the Native Spark DataSource, which simplifies and speeds up Machine Learning and Internet of Things applications. Acting as a connector, the Spark DataSource provides a native, ACID-compliant datastore and opens up the advanced capabilities (e.g. Spark Streaming and machine learning). Its design allows users to access Spark directly without the need for excessive data transfers.
The Native Spark DataSource supports the following functions:
- Create a Table: Creates a Splice Machine table using a Spark DataFrame schema.
- Insertion: Inserts rows of a DataFrame inside a Splice Machine table.
- Updates: Updates the rows of Splice Machine tables with a DataFrame.
- Upsert: Updates or inserts the rows of a table with a given DataFrame.
- Delete: Deletes the rows of a DataFrame from a table.
- Query: Issues a SQL query and returns the results as a DataFrame.
Splice Machine and the Cloud
Splice Machine is designed to work with public Clouds, such as Amazon Web Services (AWS), Heroku, and Azure. With Splice Machine’s Cloud Manager, configuring new clusters becomes quite easy. A user can scale out petabytes of data as needed, and the user pays only for what is used.
The dashboard is the entrance to the Cloud Manager. The Dashboard can generate new clusters, manage accounts, access existing clusters, review notifications, update profiles, and log out.
The Database Console is a browser-based graphical tool used to track database queries on clusters in real time. Spark queries can be monitored using the Console. If something is amiss, the query can be terminated.
Zweben said:
“Modern data intensive applications typically ingest Big Data at high speeds and require transactional and analytical capabilities in the same package. To address this challenge, companies often build complex systems consisting of multiple compute and storage engines. Splice Machine already simplifies this process by providing a hybrid solution, where an optimizer chooses between compute engines. Now we are taking the next logical step by removing the need to manage the database. Users only need to know SQL. Splice Machine does the rest.”
The Splice Machine’s Database-as-a-Service (DBaaS) system has been designed to be portable. Storage and applications are containerized, monitored, and secured and come with guaranteed availability.
Cloud Deployment Containers
Cloud Deployment Containers are designed to streamline the injection of data into a Cloud’s applications. This unique method allows designers to locally develop applications for AI and IoT while using Machine Learning and streaming technologies on their laptops. By using the same containerized code in the Cloud, organizations can train, test, and utilize Machine Learning. This allows Apache MLlib (libraries) to be sent as containers. (Spark Streaming applications can stream data into Splice Machine.)
Splice Machine Cloud Deployment Containers can transport standard applications written in programming languages like Python, Java, Node, and Scala. This allows organizations to deploy smarter, more predictive applications very quickly and easily in the Cloud.
The Challenge
The strategy of putting together lots of moving parts is a challenge Splice Machine has accepted and fulfilled. It has required activating its partner ecosystem to help the company move from a Big Data database to a solution platform. Developing a partnership with Intrigo, a business experienced with SAP supply chains, provided a good start. Splice Machine continues to develop a robust network of partners including a partnership with Accenture.
Photo Credit: filip Robert/Shutterstock.com