DevOps, DataOps, and Data Repositories

Click to learn more about author Paul Stanton.

There is a disconnect between the goals of DevOps, and the realities of working with relational data. Relational databases are core to many enterprise applications, and the Gartner Group projects that 80% of new projects will rely on relational data through 2020 (available courtesy of Microsoft here). But, most organizations refresh relational data environments for internal use only 2x monthly or less (courtesy of Dell here). The glacial pace of data delivery contrasts with .NET and Java Docker containers provisioned in seconds, and with life cycles measured in hours.

Addressing this disconnect between application development and relational data is critical for most organizations to achieve their DevOps ambitions. Recognizing the challenges involved in working with large data sets, has resulted in an industry initiative called DataOps. DataOps addresses the unique challenges of enterprise data workflows:

Relational data is “stateful” requires session persistence, and can be a single source of failure. HA clusters, data replication, and backups are employed to enhance availability, but with added complexity.
Data breaches are leading to expanded consumer privacy protection through HIPAA, GDPR, and other regulations.
Version control of data is complicated with applications often relying on multiple databases.
Data size is a challenge. Databases take hours to copy compared to seconds for .NET or Java applications.
Software Defined Data Center strategies are recognized as preferred, but relatively little progress has been made in software defined storage. Enterprises often default to a mix comprised of public cloud services, combined with proprietary storage appliances in the private data center.

Fortunately, new approaches are emerging that address these challenges. Complex terabyte class databases can incorporate data security in immutable versioned images, stored in auditable data repositories, and can be provisioned on demand to users in seconds.

Database Cloning

The best practice for working with large data sets is through use of database cloning, also referred to as snapshots. Databases can be cloned with writable data delivered in seconds, using minimal storage resources. Snapshots are supported by storage systems (NetApp, Dell/EMC, and others), but these capabilities go largely unused due to complex scripting required to provision snapshots, storage LUNs, and mount points.

A new generation of storage systems simplify data access through restful APIs. These systems include incremental snapshots, with the goal to eliminate full backups. Customer feedback is positive, and they are a step forward for storage access. Cost of ownership continues to be a challenge, as these systems are dedicated storage appliances based on the UNIX ZFS file system.

Windows Based Database Cloning and Containers

Windocks supports software based database cloning on Windows, allowing SQL Server DBAs to create and manage complex database images. Windows database clones utilize the same designs as Storage Array Networks, and deliver writable databases in seconds, with minimal storage, and user self-service. Windocks has grown rapidly, providing data delivery for a fraction of the cost of storage systems.

In addition to Windows database cloning, Windocks is also an independent port of Docker’s source to Windows. The combination of database clones with containers allows delivery of terabyte class data environments with multi-tier application environments, in seconds. Development, Test, and Reporting teams work on shared servers with containers, and simplify operations with an average reduction in VMs used of 5:1 or more.

Windocks also simplifies use SAN hosted snapshots, with a SAN ready container that automates provisioning and use of SAN based snapshots. This approach allows customers to extend the useful life of SANs, and is a boon for organizations with a mix of storage systems acquired through mergers and acquisitions.

Enterprise Data Repository

Windocks data Images are immutable, versioned, and auditable artifacts, stored in an Enterprise Data Repository (similar to source code and binary repositories in wide use today). Data images are unique by incorporating version control for complex (multiple database) environments, and incorporate data privacy and security during the image build. Images can deliver effective GDPR compliance by default and by design. Finally, the image repo provides a complete, auditable record of data images used within the enterprise.

The workflow begins with use of the production database(s) via snapshots, or full and differential backups and incremental log shipping updates. In the event of a production system problem, the most recent data image is pulled by the Development team for debugging. The image accessed includes permissions and data masking to ensure compliance with privacy policies (step 1). A proposed fix is prepared and promoted into an updated image (step 2), which is available on demand to Testing for validation (step 3).

Delivery of each of the environments is accomplished in seconds. An additional benefit of this system is the rich metadata produced, further enhancing Data Governance strategies.

Data Repository for Data Governance and Regulatory Compliance

HIPAA, GDPR and other regulations outline acceptable use of personal data by organizations, and it’s critical that data be organized into a comprehensive data library or repository for audit support and compliance. The approach outlined in this article provides a repo that includes consumer privacy by default and design. The solution addresses a range of needs:

Data masking can vary between images to support varied privacy needs within an organization. A customer support organization needs access to data that is not needed by other users. User and group permissions are matched to data images that reflect the particular user group needs.
Privacy is applied consistently for complex images that span multiple databases.
The design scales to support data access by outsourced vendors and partners, and runs wherever Windows is supported (public cloud or on premise).
Compliance, security, and privacy becomes an easily inspected, programmatically driven activity, rather than a separate discussion and organization.
The system creates a library of data images, dramatically improving data governance and user access.

Call to action for CIOs and Architects

These are exciting times for IT professionals focused on establishing Data Governance and addressing Regulatory Compliance. The combination of containers and software based database cloning presents new options for enterprise Data Governance and Compliance.

TAKE OUR DATA MANAGEMENT CERTIFICATION PREP COURSES

Data Topics

Leave a Reply Cancel reply