Challenge – A common practice has been to use local commodity hardware or cloud destinations to host source code repositories, or to move the repositories to GitHub for managed services. There are some significant challenges with this approach.
- As the size of the code base or the number of repositories grows, scalability, reliability, and predictive performance is a challenge. Teams must scale to include more developers (both local and remote), all of whom are performing code check in and check out operations.
- Blobs and large file storage (LFS) for private repositories in GitHub are expensive and can be slow for user access. Multiple private repositories add to the increasing cost of managing through GitHub.
- Most of the VCS and binary repository management tools store full copies of different source code and binary versions. This approach reduces space efficiency and increases cost as data starts to grow, leading to lower return on investment (ROI).
- Git operations put high demand on the Git servers and tend to leave the data management parts of the infrastructure underused.
- Repositories managed by Git are small to midsize code bases that require stateful containers for high availability and scalability and perform data management operations.
- Data recovery is limited and time consuming for databases and file systems accessed or managed by different VCS and binary repository management tools. This challenge affects the overall development process.
Our approach – Like many of our customers we also look to employ agile development based on DevOps culture. In NetApp’s own DevOps journey, we’ve discovered that by taking advantage of the unique capabilities of NetApp ONTAP® shared data management, we’ve made a huge difference in the speed and efficiency of our agile development processes. The following NetApp technologies enhance the user experience by integrating seamlessly with the CI/CD workflow. They also provide the performance, data efficiency, and protection along with capacity scaling for large and multiple repositories.
Performance & predictability – AFF provides predictable performance with low latency to VCS tools to perform synchronization, merge, integrate, truncate, clone, status, diff, reset, and pull/push operations at scale with respect to number of users and I/O operations when the depot and repositories are mounted over NFS from ONTAP. This performance also applies to monorepos, where hundreds of repositories are managed by a single VCS tool.a
Total Cost of Ownership (TCO) – ONTAP’s data reduction techniques like compaction and inline deduplication provide up to 40% data efficiency for the different versions of the source code files and binaries stored on AFF or ONTAP. This capability allows improved cost efficiency and scalability for more repositories with a smaller data footprint.
Data protection – Source code are the crown jewel of any organization who develop software. Data protection is an intrinsic ability of ONTAP that allows faster data recovery from failures for source code version and binary files. Data is protected via thin-provisioned NetApp Snapshot™ copies in ONTAP that are cost effective. These copies take up very little space and allow faster and reliable recovery from any data loss or any corruption from disasters. Data can be backed up to tape or to the cloud over Amazon S3.
Scaleout – Capacity scaling is trivial via ONTAP FlexGroup that allows VCS and binary repository manager tools to store and manage multiple revisions of source code and binaries in a large number and variety of repositories.
Container Integration – Trident is an external storage provisioner that dynamically provisions persistent data volumes according to storage class, on demand for stateful containers. When you manage the NetApp FlexVol® volumes as the persistent data volumes for Docker Swarm and Kubernetes by using Trident, you get scalability and resiliency for applications such as Git, Jenkins, JFrog Artifactory and other development tools. Containers provide high availability to the source code and binary repositories.
Data Tier – The FabricPool feature provides policy-based data tier capability to automatically move the binaries and pre-build artifacts to Amazon Simple Storage Service (S3). S3-compliant storage can be on the premises in the data center (private cloud) on StorageGRID Webscale (SGWS) or in public clouds such as Amazon S3 or Azure Blob.
Traditional backup of binary repositories on commodity HW are bloated and takes more than twice the storage space of the production dataset. Automatic process to tier cold snapshots and data to S3 compliant storage improves infrastructure efficiency and reduces data footprint in the production environment.
These NetApp technologies integrate directly with the development tools and workflows to provide a zero-storage touch to developers and provide direct control and access to data. In the next part of the blog series, I will discuss how NetApp technologies enhance the build automation in the CI process.