Add a section about the impacts of Git gc Change-Id: I99a4cf9c076a8d2b5bd32e0a5dc89e027825002d
diff --git a/Documentation/repository-maintenance.txt b/Documentation/repository-maintenance.txt index deb3812..9bc7c24 100644 --- a/Documentation/repository-maintenance.txt +++ b/Documentation/repository-maintenance.txt
@@ -24,6 +24,48 @@ allows administrators to finely tune the approach and resource usage of this maintenance. +== Git Garbage Collection Impacts + +Unlike a typical server database, access to Git repositories is not +marshalled through a single process or a set of inter communicating +processes. Unfortuntatlely the design of the on-disk layout of a Git +repository does not allow for 100% race free operations when accessed by +multiple actors concurrently. These design shortcomings are more likely +to impact the operations of busy repositories since racy conditions are +more likely to occur when there are more concurrent operations. Since +most Gerrit servers are expected to run without interruptions, Git +garbage collection likely needs to be run during normal operational hours. +When it runs, it adds to the concurrency of the overall accesses. Given +that many of the operations in garbage collection involve deleting files +and directories, it has a higher chance of impacting other ongoing +operations than most other operations. + +=== Interrupted Operations + +When Git garbage collection deletes a file or directory that is +currently in use by an ongoing operation, it can cause that operation to +fail. These sorts of failures are often single shot failures, i.e. the +operation will succeed if tried again. An example of such a failure is +when a pack file is deleted while Gerrit is sending an object in the +file over the network to a user performing a clone or fetch. Usually +pack files are only deleted when the referenced objects in them have +been repacked and thus copied to a new pack file. So performing the same +operation again after the fetch will likely send the same object from +the new pack instead of the deleted one, and the operation will succeed. + +=== Data Loss + +It is possible for data loss to occur when Git garbage collection runs. +This is very rare, but it can happen. This can happen when an object is +believed to be unreferenced when object repacking is running, and then +garbage collection deletes it. This can happen because even though an +object may indeed be unreferenced when object repacking begins and +reachability of all objects is determined, it can become referenced by +another concurrent operation after this unreferenced determination but +before it gets deleted. When this happens, a new reference can be +created which points to a now missing object, and this will result in a +loss. + GERRIT ------ Part of link:index.html[Gerrit Code Review]