| :linkattrs: | 
 | = Gerrit Code Review - Backup | 
 |  | 
 | A Gerrit Code Review site contains data that needs to be backed up regularly. | 
 | This document describes best practices for backing up review data. | 
 |  | 
 | [#mand-backup] | 
 | == Data which must be backed up | 
 |  | 
 | [#mand-backup-git] | 
 | Git repositories:: | 
 | + | 
 | The bare Git repositories managed by Gerrit are typically stored in the | 
 | `${SITE}/git` directory. However, the locations can be customized in | 
 | `${site}/etc/gerrit.config`. They contain the history of the respective | 
 | projects, and since 2.15 if you are using _NoteDb_, and for 3.0 and newer, | 
 | also change and review metadata, user accounts and groups. | 
 | + | 
 |  | 
 | [#mand-backup-db] | 
 | SQL database:: | 
 | + | 
 | Gerrit releases in the 2.x series store some data in the database you | 
 | have chosen when installing Gerrit. If you are using 2.16 and have | 
 | migrated to _NoteDb_ only the schema version is stored in the database. | 
 | + | 
 | If you are using h2 you need to backup the `.db` files in the folder | 
 | `${SITE}/db`. | 
 | + | 
 | For all other database types refer to their backup documentation. | 
 | + | 
 | Gerrit release 3.0 and newer store all primary data in _NoteDb_ inside | 
 | the git repositories of the Gerrit site. Only the review flag marking in | 
 | the UI when you have reviewed a changed file is stored in a relational | 
 | database. If you are using h2 this database is named | 
 | `account_patch_reviews.h2.db`. | 
 |  | 
 | [#optional-backup] | 
 | == Data optional to be backed up | 
 |  | 
 | [#data-optional-backup-index] | 
 | Search index:: | 
 | + | 
 | The _Lucene_ search index is stored in the `${SITE}/index` folder. | 
 | It can be recomputed from primary data in the git repositories but | 
 | reindexing may take a long time hence backing up the index makes sense | 
 | for production installations. | 
 |  | 
 | [#optional-backup-cache] | 
 | Caches:: | 
 | + | 
 | Gerrit uses many caches which populate automatically. Some of the caches | 
 | are persisted in the directory `${SITE}/cache` to retain the cached data | 
 | across restarts. Since repopulating persistent caches takes time and server | 
 | resources it makes sense to include them in backups to avoid unnecessary | 
 | higher load and degraded performance when a Gerrit site has been restored | 
 | from backup and caches need to be repopulated. | 
 |  | 
 | [#optional-backup-config] | 
 | Configuration:: | 
 | + | 
 | Gerrit configuration files are located in the directory `${SITE}/etc` | 
 | and should be backed up or versioned in a git repository. The `etc` | 
 | directory also contains secrets which should be handled separately | 
 | + | 
 | * `secure.config` contains passwords and `auth.registerEmailPrivateKey` | 
 | * public and private SSH host keys | 
 | + | 
 | You may consider to use the | 
 | link:https://gerrit.googlesource.com/plugins/secure-config/[secure-config plugin,role=external,window=_blank] | 
 | to encrypt these secrets. | 
 |  | 
 | [#optional-backup-plugin-data] | 
 | Plugin Data:: | 
 | + | 
 | The `${SITE}/data/` directory is used by plugins storing data like e.g. | 
 | the delete-project and the replication plugin. | 
 |  | 
 | [#optional-backup-libs] | 
 | Libraries:: | 
 | + | 
 | The `${SITE}/lib/` directory contains libraries used as statically loaded | 
 | plugin or providing additional dependencies needed by Gerrit plugins. | 
 |  | 
 | [#optional-backup-plugins] | 
 | Plugins:: | 
 | + | 
 | The `${SITE}/plugins/` directory contains the installed Gerrit plugins. | 
 |  | 
 | [#optional-backup-static] | 
 | Static Resources:: | 
 | + | 
 | The `${SITE}/static/` directory contains static resources used to customize the | 
 | Gerrit UI and email templates. | 
 |  | 
 | [#optional-backup-logs] | 
 | Logs:: | 
 | + | 
 | The `${SITE}/logs/` directory contains Gerrit server log files. Logs can still | 
 | be written when the server is in read-only mode. | 
 |  | 
 | [#cons-backup] | 
 | == Consistent backups | 
 |  | 
 | There are several ways to ensure consistency when backing up primary data. | 
 |  | 
 | [#cons-backup-snapshot] | 
 | === Filesystem snapshots | 
 |  | 
 | Gerrit 3.0 or newer:: | 
 | + | 
 | * all primary data is stored in git | 
 | * Use a file system like lvm, zfs, btrfs or nfs supporting snapshots. | 
 | Create a snapshot and then archive the snapshot. | 
 |  | 
 | Gerrit 2.x:: | 
 | + | 
 | Gerrit 2.16 can use _NoteDb_ to store almost all this data which | 
 | simplifies creating backups since consistency between database and git | 
 | repositories is no longer critical. If you migrated to _NoteDb_ you can | 
 | follow the backup procedure for 3.0 and higher and additionally take | 
 | a backup of the database, which only contains the schema version, | 
 | hence consistency between git and database is no longer critical since | 
 | the schema version only changes during upgrade. If you didn't migrate | 
 | to _NoteDb_ then follow the backup procedure for older 2.x Gerrit versions. | 
 | + | 
 | Older 2.x Gerrit versions store change meta data, review comments, votes, | 
 | accounts and group information in a SQL database. Creating consistent backups | 
 | where git repositories and the data stored in the database are backed up | 
 | consistently requires to turn the server read-only or to shut it down | 
 | while creating the backup since there is no integrated transaction handling | 
 | between git repositories and the SQL database. Also crons and currently | 
 | running cron jobs (e.g. repacking repositories) which affect the repositories | 
 | may need to be shut down. | 
 | Use a file system supporting snapshots to keep the period where the gerrit | 
 | server is read-only or down as short as possible. | 
 |  | 
 | [#cons-backup-read-only] | 
 | === Turn primary server read-only for backup | 
 |  | 
 | Make the primary server handling write operations read-only before taking the | 
 | backup. This means read-access is still available from replica servers during | 
 | backup, because only write operations have to be stopped to ensure consistency. | 
 | This can be implemented using the | 
 | link:https://gerrit.googlesource.com/plugins/readonly/[_readonly_,role=external,window=_blank] plugin. | 
 |  | 
 | [#cons-backup-replicate] | 
 | === Replicate data for backup | 
 |  | 
 | Replicating the git repositories can backup the most critical repository data | 
 | but does not backup repository meta-data such as the project description | 
 | file, ref-logs, git configs, and alternate configs. | 
 |  | 
 | Replicate all git repositories to another file system using | 
 | `git clone --mirror`, | 
 | or the | 
 | link:https://gerrit.googlesource.com/plugins/replication[replication plugin,role=external,window=_blank] | 
 | or the | 
 | link:https://gerrit.googlesource.com/plugins/pull-replication[pull-replication plugin,role=external,window=_blank]. | 
 | Best you use a filesystem supporting snapshots to create a backup archive | 
 | of such a replica. | 
 |  | 
 | For 2.x Gerrit versions also set up a database replica for the data stored in the | 
 | SQL database. If you are using 2.16 and migrated to _NoteDb_ you may consider to | 
 | skip setting up a database replica, instead take a backup of the database which only | 
 | contains the current schema version in this case. | 
 | In addition you need to ensure that no write operations are in flight before you | 
 | take the replica offline. Otherwise the database backup might be inconsistent | 
 | with the backup of the git repositories. | 
 |  | 
 | Do not skip backing up the replica, the replica alone IS NOT a backup. | 
 | Imagine someone deleted a project by mistake and this deletion got replicated. | 
 | Replication of repository deletions can be switched off using the | 
 | link:https://gerrit.googlesource.com/plugins/replication/+/refs/heads/master/src/main/resources/Documentation/config.md[server option,role=external,window=_blank] | 
 | `remote.NAME.replicateProjectDeletions`. | 
 |  | 
 | If you are using Gerrit replica to offload read traffic you can use one of these | 
 | replica for creating backups. | 
 |  | 
 | [#cons-backup-offline] | 
 | === Take primary server offline for backup | 
 |  | 
 | Shut down the primary server handling write operations before taking a backup. | 
 | This is simple but means downtime for the users. Also crons and currently | 
 | running cron jobs (e.g. repacking repositories) which affect the repositories | 
 | may need to be shut down. | 
 |  | 
 | [#backup-methods] | 
 | == Backup methods | 
 |  | 
 | [#backup-methods-snapshots] | 
 | === Filesystem snapshots | 
 |  | 
 | Filesystems supporting copy on write snapshots:: | 
 | + | 
 | Use a file system supporting copy-on-write snapshots like | 
 | link:https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Snapshots[btrfs,role=external,window=_blank] | 
 | or | 
 | https://wiki.debian.org/ZFS#Snapshots[zfs,role=external,window=_blank]. | 
 |  | 
 |  | 
 | Other filesystems supporting snapshots:: | 
 | https://wiki.archlinux.org/index.php/LVM#Snapshots[lvm,role=external,window=_blank] or nfs. | 
 | + | 
 | Create a snapshot and then archive the snapshot to another storage. | 
 | + | 
 | While snapshots are great for creating high quality backups quickly, they are | 
 | not ideal as a format for storing backup data. Snapshots typically depend and | 
 | reside on the same storage infrastructure as the original disk images. | 
 | Therefore, it’s crucial that you archive these snapshots and store them | 
 | elsewhere. | 
 |  | 
 | 3.0 or newer:: | 
 | Snapshot the complete site directory | 
 |  | 
 | 2.x:: | 
 | Similar, but the data of the database should be stored on the very same volume | 
 | on the same machine, so that the snapshot is taken atomically over both | 
 | the git data and the database data. Because everything should be ACID, it can safely | 
 | crash-recover - as if the power has been plugged and the server got booted up again. | 
 | (Actually more safe than that, because the filesystem knows about taking the snapshot, | 
 | and also about the pending writes it can sync.) | 
 |  | 
 | In addition to that, using filesystem snapshots allows to: | 
 |  | 
 | * easy and fast roll back without having to access remote backup data (e.g. to restore | 
 | accidental rm -rf git/ back in seconds). | 
 | * incremental transfer of consistent snapshots | 
 | * save a lot of data while still keeping multiple "known consistent states" | 
 |  | 
 | [#backup-methods-other] | 
 | === Other backup methods | 
 |  | 
 | To ensure consistent backups these backup methods require to turn the server into | 
 | read-only mode while a backup is running. | 
 |  | 
 | * create an archive like `tar.gz` to backup the site | 
 | * `rsync` | 
 | * plain old `cp` | 
 |  | 
 | [#backup-methods-test] | 
 | == Test backups | 
 |  | 
 | Test backups and fire drill restoring backups to ensure the backups aren't | 
 | corrupt or incomplete and you can restore a backup quickly. | 
 |  | 
 | [#backup-dr] | 
 | == Disaster recovery | 
 |  | 
 | [#backup-dr-repl] | 
 | === Replicate backup archives | 
 |  | 
 | To enable disaster recovery at least replicate backup archives to another data center. | 
 | And fire drill restoring a new site using the backup. | 
 |  | 
 | [#backup-dr-multi-site] | 
 | === Multi-site setup | 
 |  | 
 | Use the https://gerrit.googlesource.com/plugins/multi-site[multi-site plugin,role=external,window=_blank] | 
 | to install Gerrit with multiple sites installed in different datacenters | 
 | across different regions. This ensures that in case of a severe problem with | 
 | one of the sites, the other sites can still serve your repositories. | 
 |  | 
 | GERRIT | 
 | ------ | 
 | Part of link:index.html[Gerrit Code Review] | 
 |  | 
 | SEARCHBOX | 
 | --------- |