Click to learn more about author Joe deBuzna.
In my previous blog post I discussed hybrid cloud computing. The phrase Hybrid Cloud Computing is used to describe the coexistence of multiple environments, at least one of which is a cloud environment. Hybrid cloud computing introduces new data integration requirements for data/database-dependent application.
In this blog post, I will discuss three key requirements for data integration in the cloud: Optimized Data Transfer, Security, and Manageability. I provide considerations for each of these capabilities that will hopefully help you better evaluate cloud data integration solutions.
1. Optimized Data Transfer
A cloud availability zone is essentially a data center managed by the cloud provider. In a hybrid cloud environment, data integration into or out of the availability zone, is data integration over a Wide Area Network (WAN) that may have a small charge per GB of data transferred. Optimizing data transfer is important not only to maximize performance but also to limit the cost. So, how can you optimize data transfer?
- The first consideration is to only transfer changes between environments. The phrase commonly used to describe such approach for databases is Change Data Capture (CDC). Post an initial data synchronization (as needed) capture and transfer only incremental changes.
- Data compression can be applied prior to sending data to further minimize data transfer, increase performance, and lower costs
- Data transfer across a WAN should be optimized to perform as little back-and-forth communication as possible to limit sensitivity for high latency on network communication. Large block transfer is a technique to achieve this, with on top of that, an approach to maximize available bandwidth despite relatively high latency.
2. Security
With data transferring between data centers, security has to be top of mind. Securing data has multiple aspects:
- Exposure: how to limit exposure to security breaches. All data centers and corporate networks use firewalls. Review firewall settings and requirements to enable connectivity. Obviously, data connectivity must be established, but consider whether firewalls have to be opened in both directions, and also explore options to limit exposure. (e.g. through the use of a proxy (to avoid the need to expose access to a production system directly in the firewall)) Finally, lock down the firewall as much as possible, down to an individual server if possible.
- Authentication: with access to a system exposed, prevent access by using strong authentication rules. Password authentication is an approach but also look for certificate authentication or even options for dual factor authentication.
- Secure data transfer: utilize either VPN or SSL connectivity when passing data across the wire.
3. Manageability
Considering hybrid cloud is a given in the foreseeable future, how do you plan to manage cloud data integration? The following are some questions to consider:
- Will you be configuring individual data flows or does a single console provide an overview of the data integration flows?
- How resilient is the setup against relatively common interruptions? (e.g. network glitches or system restarts)
- Can you set up automatic alerts when there are issues that require operator intervention when SLAs would otherwise not be met?
- Can you easily review the current state of the data flows?
- How do you gain insight into what happened with a data flow so you can prevent interruptions or anticipate and avoid problems?
Hybrid cloud introduces additional data integration challenges relative to data integration within the scope of a data center. Make sure you prepare well as you start adopting the cloud.