| = Gerrit Code Review - Repository Maintenance | 
 |  | 
 | == Description | 
 |  | 
 | Each project in Gerrit is stored in a bare Git repository. Gerrit uses | 
 | the JGit library to access (read and write to) these Git repositories. | 
 | As modifications are made to a project, Git repository maintenance will | 
 | be needed or performance will eventually suffer. When using the Git | 
 | command line tool to operate on a Git repository, it will run `git gc` | 
 | every now and then on the repository to ensure that Git garbage | 
 | collection is performed. However regular maintenance does not happen as | 
 | a result of normal Gerrit operations, so this is something that Gerrit | 
 | administrators need to plan for. | 
 |  | 
 | Gerrit has a built-in feature which allows it to run Git garbage | 
 | collection on repositories. This can be | 
 | link:config-gerrit.html#gc[configured] to run on a regular basis, and/or | 
 | this can be run manually with the link:cmd-gc.html[gerrit gc] ssh | 
 | command, or with the link:rest-api-projects.html#run-gc[run-gc] REST API. | 
 | Some administrators will opt to run `git gc` or `jgit gc` outside of | 
 | Gerrit instead. There are many reasons this might be done, the main one | 
 | likely being that when it is run in Gerrit it can be very resource | 
 | intensive and scheduling an external job to run Git garbage collection | 
 | allows administrators to finely tune the approach and resource usage of | 
 | this maintenance. | 
 |  | 
 | == Git Garbage Collection Impacts | 
 |  | 
 | Unlike a typical server database, access to Git repositories is not | 
 | marshalled through a single process or a set of inter communicating | 
 | processes. Unfortunately the design of the on-disk layout of a Git | 
 | repository does not allow for 100% race free operations when accessed by | 
 | multiple actors concurrently. These design shortcomings are more likely | 
 | to impact the operations of busy repositories since racy conditions are | 
 | more likely to occur when there are more concurrent operations. Since | 
 | most Gerrit servers are expected to run without interruptions, Git | 
 | garbage collection likely needs to be run during normal operational hours. | 
 | When it runs, it adds to the concurrency of the overall accesses. Given | 
 | that many of the operations in garbage collection involve deleting files | 
 | and directories, it has a higher chance of impacting other ongoing | 
 | operations than most other operations. | 
 |  | 
 | === Interrupted Operations | 
 |  | 
 | When Git garbage collection deletes a file or directory that is | 
 | currently in use by an ongoing operation, it can cause that operation to | 
 | fail. These sorts of failures are often single shot failures, i.e. the | 
 | operation will succeed if tried again. An example of such a failure is | 
 | when a pack file is deleted while Gerrit is sending an object in the | 
 | file over the network to a user performing a clone or fetch. Usually | 
 | pack files are only deleted when the referenced objects in them have | 
 | been repacked and thus copied to a new pack file. So performing the same | 
 | operation again after the fetch will likely send the same object from | 
 | the new pack instead of the deleted one, and the operation will succeed. | 
 |  | 
 | === Data Loss | 
 |  | 
 | It is possible for data loss to occur when Git garbage collection runs. | 
 | This is very rare, but it can happen. This can happen when an object is | 
 | believed to be unreferenced when object repacking is running, and then | 
 | garbage collection deletes it. This can happen because even though an | 
 | object may indeed be unreferenced when object repacking begins and | 
 | reachability of all objects is determined, it can become referenced by | 
 | another concurrent operation after this unreferenced determination but | 
 | before it gets deleted. When this happens, a new reference can be | 
 | created which points to a now missing object, and this will result in a | 
 | loss. | 
 |  | 
 | == Reducing Git Garbage Collection Impacts | 
 |  | 
 | JGit has a `preserved` directory feature which is intended to reduce | 
 | some of the impacts of Git garbage collection, and Gerrit can take | 
 | advantage of the feature too. The `preserved` directory is a | 
 | subdirectory of a repository's `objects/pack` directory where JGit will | 
 | move pack files that it would normally delete when `jgit gc` is invoked | 
 | with the `--preserve-oldpacks` option. It will later delete these files | 
 | the next time that `jgit gc` is run if it is invoked with the | 
 | `--prune-preserved` option. Using these flags together on every `jgit gc` | 
 | invocation means that packfiles will get an extended lifetime by one | 
 | full garbage collection cycle. Since an atomic move is used to move these | 
 | files, any open references to them will continue to work, even on NFS. On | 
 | a busy repository, preserving pack files can make operations much more | 
 | reliable, and interrupted operations should almost entirely disappear. | 
 |  | 
 | Moving files to the `preserved` directory also has the ability to reduce | 
 | data loss. If JGit cannot find an object it needs in its current object | 
 | DB, it will look into the `preserved` directory as a last resort. If it | 
 | finds the object in a pack file there, it will restore the | 
 | slated-to-be-deleted pack file back to the original `objects/pack` | 
 | directory effectively "undeleting" it and making all the objects in it | 
 | available again. When this happens, data loss is prevented. | 
 |  | 
 | One advantage of restoring preserved packfiles in this way when an | 
 | object is referenced in them, is that it makes loosening unreferenced | 
 | objects during Git garbage collection, which is a potentially expensive, | 
 | wasteful, and performance impacting operation, no longer desirable. It | 
 | is recommended that if you use Git for garbage collection, that you use | 
 | the `-a` option to `git repack` instead of the `-A` option to no longer | 
 | perform this loosening. | 
 |  | 
 | When Git is used for garbage collection instead of JGit, it is fairly | 
 | easy to wrap `git gc` or `git repack` with a small script which has a | 
 | `--prune-preserved` option which behaves as mentioned above by deleting | 
 | any pack files currently in the preserved directory, and also has a | 
 | `--preserve-oldpacks` option which then hardlinks all the currently | 
 | existing pack files from the `objects/pack` directory into the | 
 | `preserved` directory right before calling the real Git command. This | 
 | approach will then behave similarly to `jgit gc` with respect to | 
 | preserving pack files. | 
 |  | 
 | GERRIT | 
 | ------ | 
 | Part of link:index.html[Gerrit Code Review] | 
 |  | 
 | SEARCHBOX | 
 | --------- |