|  | :linkattrs: | 
|  | = Gerrit Code Review - Backup | 
|  |  | 
|  | A Gerrit Code Review site contains data that needs to be backed up regularly. | 
|  | This document describes best practices for backing up review data. | 
|  |  | 
|  | [#mand-backup] | 
|  | == Data which must be backed up | 
|  |  | 
|  | [#mand-backup-git] | 
|  | Git repositories:: | 
|  | + | 
|  | The bare Git repositories managed by Gerrit are typically stored in the | 
|  | `${SITE}/git` directory. However, the locations can be customized in | 
|  | `${site}/etc/gerrit.config`. They contain the history of the respective | 
|  | projects, and since 2.15 if you are using _NoteDb_, and for 3.0 and newer, | 
|  | also change and review metadata, user accounts and groups. | 
|  | + | 
|  |  | 
|  | [#mand-backup-db] | 
|  | SQL database:: | 
|  | + | 
|  | Gerrit releases in the 2.x series store some data in the database you | 
|  | have chosen when installing Gerrit. If you are using 2.16 and have | 
|  | migrated to _NoteDb_ only the schema version is stored in the database. | 
|  | + | 
|  | If you are using h2 you need to backup the `.db` files in the folder | 
|  | `${SITE}/db`. | 
|  | + | 
|  | For all other database types refer to their backup documentation. | 
|  | + | 
|  | Gerrit release 3.0 and newer store all primary data in _NoteDb_ inside | 
|  | the git repositories of the Gerrit site. Only the review flag marking in | 
|  | the UI when you have reviewed a changed file is stored in a relational | 
|  | database. If you are using h2 this database is named | 
|  | `account_patch_reviews.h2.db`. | 
|  |  | 
|  | [#optional-backup] | 
|  | == Data optional to be backed up | 
|  |  | 
|  | [#data-optional-backup-index] | 
|  | Search index:: | 
|  | + | 
|  | The _Lucene_ search index is stored in the `${SITE}/index` folder. | 
|  | It can be recomputed from primary data in the git repositories but | 
|  | reindexing may take a long time hence backing up the index makes sense | 
|  | for production installations. | 
|  |  | 
|  | [#optional-backup-cache] | 
|  | Caches:: | 
|  | + | 
|  | Gerrit uses many caches which populate automatically. Some of the caches | 
|  | are persisted in the directory `${SITE}/cache` to retain the cached data | 
|  | across restarts. Since repopulating persistent caches takes time and server | 
|  | resources it makes sense to include them in backups to avoid unnecessary | 
|  | higher load and degraded performance when a Gerrit site has been restored | 
|  | from backup and caches need to be repopulated. | 
|  |  | 
|  | [#optional-backup-config] | 
|  | Configuration:: | 
|  | + | 
|  | Gerrit configuration files are located in the directory `${SITE}/etc` | 
|  | and should be backed up or versioned in a git repository. The `etc` | 
|  | directory also contains secrets which should be handled separately | 
|  | + | 
|  | * `secure.config` contains passwords and `auth.registerEmailPrivateKey` | 
|  | * public and private SSH host keys | 
|  | + | 
|  | You may consider to use the | 
|  | link:https://gerrit.googlesource.com/plugins/secure-config/[secure-config plugin,role=external,window=_blank] | 
|  | to encrypt these secrets. | 
|  |  | 
|  | [#optional-backup-plugin-data] | 
|  | Plugin Data:: | 
|  | + | 
|  | The `${SITE}/data/` directory is used by plugins storing data like e.g. | 
|  | the delete-project and the replication plugin. | 
|  |  | 
|  | [#optional-backup-libs] | 
|  | Libraries:: | 
|  | + | 
|  | The `${SITE}/lib/` directory contains libraries used as statically loaded | 
|  | plugin or providing additional dependencies needed by Gerrit plugins. | 
|  |  | 
|  | [#optional-backup-plugins] | 
|  | Plugins:: | 
|  | + | 
|  | The `${SITE}/plugins/` directory contains the installed Gerrit plugins. | 
|  |  | 
|  | [#optional-backup-static] | 
|  | Static Resources:: | 
|  | + | 
|  | The `${SITE}/static/` directory contains static resources used to customize the | 
|  | Gerrit UI and email templates. | 
|  |  | 
|  | [#optional-backup-logs] | 
|  | Logs:: | 
|  | + | 
|  | The `${SITE}/logs/` directory contains Gerrit server log files. Logs can still | 
|  | be written when the server is in read-only mode. | 
|  |  | 
|  | [#cons-backup] | 
|  | == Consistent backups | 
|  |  | 
|  | There are several ways to ensure consistency when backing up primary data. | 
|  |  | 
|  | [#cons-backup-snapshot] | 
|  | === Filesystem snapshots | 
|  |  | 
|  | Gerrit 3.0 or newer:: | 
|  | + | 
|  | * all primary data is stored in git | 
|  | * Use a file system like lvm, zfs, btrfs or nfs supporting snapshots. | 
|  | Create a snapshot and then archive the snapshot. | 
|  |  | 
|  | Gerrit 2.x:: | 
|  | + | 
|  | Gerrit 2.16 can use _NoteDb_ to store almost all this data which | 
|  | simplifies creating backups since consistency between database and git | 
|  | repositories is no longer critical. If you migrated to _NoteDb_ you can | 
|  | follow the backup procedure for 3.0 and higher and additionally take | 
|  | a backup of the database, which only contains the schema version, | 
|  | hence consistency between git and database is no longer critical since | 
|  | the schema version only changes during upgrade. If you didn't migrate | 
|  | to _NoteDb_ then follow the backup procedure for older 2.x Gerrit versions. | 
|  | + | 
|  | Older 2.x Gerrit versions store change meta data, review comments, votes, | 
|  | accounts and group information in a SQL database. Creating consistent backups | 
|  | where git repositories and the data stored in the database are backed up | 
|  | consistently requires to turn the server read-only or to shut it down | 
|  | while creating the backup since there is no integrated transaction handling | 
|  | between git repositories and the SQL database. Also crons and currently | 
|  | running cron jobs (e.g. repacking repositories) which affect the repositories | 
|  | may need to be shut down. | 
|  | Use a file system supporting snapshots to keep the period where the gerrit | 
|  | server is read-only or down as short as possible. | 
|  |  | 
|  | [#cons-backup-read-only] | 
|  | === Turn primary server read-only for backup | 
|  |  | 
|  | Make the primary server handling write operations read-only before taking the | 
|  | backup. This means read-access is still available from replica servers during | 
|  | backup, because only write operations have to be stopped to ensure consistency. | 
|  | This can be implemented using the | 
|  | link:https://gerrit.googlesource.com/plugins/readonly/[_readonly_,role=external,window=_blank] plugin. | 
|  |  | 
|  | [#cons-backup-replicate] | 
|  | === Replicate data for backup | 
|  |  | 
|  | Replicating the git repositories can backup the most critical repository data | 
|  | but does not backup repository meta-data such as the project description | 
|  | file, ref-logs, git configs, and alternate configs. | 
|  |  | 
|  | Replicate all git repositories to another file system using | 
|  | `git clone --mirror`, | 
|  | or the | 
|  | link:https://gerrit.googlesource.com/plugins/replication[replication plugin,role=external,window=_blank] | 
|  | or the | 
|  | link:https://gerrit.googlesource.com/plugins/pull-replication[pull-replication plugin,role=external,window=_blank]. | 
|  | Best you use a filesystem supporting snapshots to create a backup archive | 
|  | of such a replica. | 
|  |  | 
|  | For 2.x Gerrit versions also set up a database replica for the data stored in the | 
|  | SQL database. If you are using 2.16 and migrated to _NoteDb_ you may consider to | 
|  | skip setting up a database replica, instead take a backup of the database which only | 
|  | contains the current schema version in this case. | 
|  | In addition you need to ensure that no write operations are in flight before you | 
|  | take the replica offline. Otherwise the database backup might be inconsistent | 
|  | with the backup of the git repositories. | 
|  |  | 
|  | Do not skip backing up the replica, the replica alone IS NOT a backup. | 
|  | Imagine someone deleted a project by mistake and this deletion got replicated. | 
|  | Replication of repository deletions can be switched off using the | 
|  | link:https://gerrit.googlesource.com/plugins/replication/+/refs/heads/master/src/main/resources/Documentation/config.md[server option,role=external,window=_blank] | 
|  | `remote.NAME.replicateProjectDeletions`. | 
|  |  | 
|  | If you are using Gerrit replica to offload read traffic you can use one of these | 
|  | replica for creating backups. | 
|  |  | 
|  | [#cons-backup-offline] | 
|  | === Take primary server offline for backup | 
|  |  | 
|  | Shut down the primary server handling write operations before taking a backup. | 
|  | This is simple but means downtime for the users. Also crons and currently | 
|  | running cron jobs (e.g. repacking repositories) which affect the repositories | 
|  | may need to be shut down. | 
|  |  | 
|  | [#backup-methods] | 
|  | == Backup methods | 
|  |  | 
|  | [#backup-methods-snapshots] | 
|  | === Filesystem snapshots | 
|  |  | 
|  | Filesystems supporting copy on write snapshots:: | 
|  | + | 
|  | Use a file system supporting copy-on-write snapshots like | 
|  | link:https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Snapshots[btrfs,role=external,window=_blank] | 
|  | or | 
|  | https://wiki.debian.org/ZFS#Snapshots[zfs,role=external,window=_blank]. | 
|  |  | 
|  |  | 
|  | Other filesystems supporting snapshots:: | 
|  | https://wiki.archlinux.org/index.php/LVM#Snapshots[lvm,role=external,window=_blank] or nfs. | 
|  | + | 
|  | Create a snapshot and then archive the snapshot to another storage. | 
|  | + | 
|  | While snapshots are great for creating high quality backups quickly, they are | 
|  | not ideal as a format for storing backup data. Snapshots typically depend and | 
|  | reside on the same storage infrastructure as the original disk images. | 
|  | Therefore, it’s crucial that you archive these snapshots and store them | 
|  | elsewhere. | 
|  |  | 
|  | 3.0 or newer:: | 
|  | Snapshot the complete site directory | 
|  |  | 
|  | 2.x:: | 
|  | Similar, but the data of the database should be stored on the very same volume | 
|  | on the same machine, so that the snapshot is taken atomically over both | 
|  | the git data and the database data. Because everything should be ACID, it can safely | 
|  | crash-recover - as if the power has been plugged and the server got booted up again. | 
|  | (Actually more safe than that, because the filesystem knows about taking the snapshot, | 
|  | and also about the pending writes it can sync.) | 
|  |  | 
|  | In addition to that, using filesystem snapshots allows to: | 
|  |  | 
|  | * easy and fast roll back without having to access remote backup data (e.g. to restore | 
|  | accidental rm -rf git/ back in seconds). | 
|  | * incremental transfer of consistent snapshots | 
|  | * save a lot of data while still keeping multiple "known consistent states" | 
|  |  | 
|  | [#backup-methods-other] | 
|  | === Other backup methods | 
|  |  | 
|  | To ensure consistent backups these backup methods require to turn the server into | 
|  | read-only mode while a backup is running. | 
|  |  | 
|  | * create an archive like `tar.gz` to backup the site | 
|  | * `rsync` | 
|  | * plain old `cp` | 
|  |  | 
|  | [#backup-methods-test] | 
|  | == Test backups | 
|  |  | 
|  | Test backups and fire drill restoring backups to ensure the backups aren't | 
|  | corrupt or incomplete and you can restore a backup quickly. | 
|  |  | 
|  | [#backup-dr] | 
|  | == Disaster recovery | 
|  |  | 
|  | [#backup-dr-repl] | 
|  | === Replicate backup archives | 
|  |  | 
|  | To enable disaster recovery at least replicate backup archives to another data center. | 
|  | And fire drill restoring a new site using the backup. | 
|  |  | 
|  | [#backup-dr-multi-site] | 
|  | === Multi-site setup | 
|  |  | 
|  | Use the https://gerrit.googlesource.com/plugins/multi-site[multi-site plugin,role=external,window=_blank] | 
|  | to install Gerrit with multiple sites installed in different datacenters | 
|  | across different regions. This ensures that in case of a severe problem with | 
|  | one of the sites, the other sites can still serve your repositories. | 
|  |  | 
|  | GERRIT | 
|  | ------ | 
|  | Part of link:index.html[Gerrit Code Review] | 
|  |  | 
|  | SEARCHBOX | 
|  | --------- |