Sustain Git repo performance under heavy load

Over time we‘ve become accustomed to taking for granted that Source Code Management systems are always available. SCMs are a mission-critical part of any software-related business. When they don’t work as expected, whole organizations come to a halt quicker than you would expect, therefore, efficient maintenance of such systems is crucial to success.

Git repositories that are under heavy load quickly become inefficient or even inaccessible. This impacts client operations of all types (e.g., git-upload-pack, git-receive-pack). Currently, the only countermeasures are a full GC or a geometric repacking, either time or metrics-based.

As the repositories grow, running a full GC takes longer, is more expensive, and risks introducing additional workload at inopportune times (e.g., running GC during a burst of repository activity may bring nodes to a standstill). In this talk, we will introduce an AI-driven approach to maintaining the performance of busy Git repositories that undergo heavy workloads. The AI model will explore and learn different strategies, including partial repacking, bitmap regeneration, empty directory removals, and more, by evaluating its success using reinforcement learning.

Daniele Sassoli, Senior Engineering Manager / GerritForge Inc.