blob: 9139e712bc7ee28f952bf93de38260d4fc7ec4f6 [file] [log] [blame]
= Gerrit Code Review - Backup
A Gerrit Code Review site contains data that needs to be backed up regularly.
This document describes best practices for backing up review data.
== Data which must be backed up
Git repositories::
The bare Git repositories managed by Gerrit are typically stored in the
`${SITE}/git` directory. However, the locations can be customized in
`${site}/etc/gerrit.config`. They contain the history of the respective
projects, and since 2.15 if you are using _NoteDb_, and for 3.0 and newer,
also change and review metadata, user accounts and groups.
SQL database::
Gerrit releases in the 2.x series store some data in the database you
have chosen when installing Gerrit. If you are using 2.16 and have
migrated to _NoteDb_ only the schema version is stored in the database.
If you are using h2 you need to backup the `.db` files in the folder
For all other database types refer to their backup documentation.
Gerrit release 3.0 and newer store all primary data in _NoteDb_ inside
the git repositories of the Gerrit site. Only the review flag marking in
the UI when you have reviewed a changed file is stored in a relational
database. If you are using h2 this database is named
== Data optional to be backed up
Search index::
The _Lucene_ search index is stored in the `${SITE}/index` folder.
It can be recomputed from primary data in the git repositories but
reindexing may take a long time hence backing up the index makes sense
for production installations.
Gerrit uses many caches which populate automatically. Some of the caches
are persisted in the directory `${SITE}/cache` to retain the cached data
across restarts. Since repopulating persistent caches takes time and server
resources it makes sense to include them in backups to avoid unnecessary
higher load and degraded performance when a Gerrit site has been restored
from backup and caches need to be repopulated.
Gerrit configuration files are located in the directory `${SITE}/etc`
and should be backed up or versioned in a git repository. The `etc`
directory also contains secrets which should be handled separately
* `secure.config` contains passwords and `auth.registerEmailPrivateKey`
* public and private SSH host keys
You may consider to use the
link:[secure-config plugin,role=external,window=_blank]
to encrypt these secrets.
Plugin Data::
The `${SITE}/data/` directory is used by plugins storing data like e.g.
the delete-project and the replication plugin.
The `${SITE}/lib/` directory contains libraries used as statically loaded
plugin or providing additional dependencies needed by Gerrit plugins.
The `${SITE}/plugins/` directory contains the installed Gerrit plugins.
Static Resources::
The `${SITE}/static/` directory contains static resources used to customize the
Gerrit UI and email templates.
The `${SITE}/logs/` directory contains Gerrit server log files. Logs can still
be written when the server is in read-only mode.
== Consistent backups
There are several ways to ensure consistency when backing up primary data.
=== Filesystem snapshots
Gerrit 3.0 or newer::
* all primary data is stored in git
* Use a file system like lvm, zfs, btrfs or nfs supporting snapshots.
Create a snapshot and then archive the snapshot.
Gerrit 2.x::
Gerrit 2.16 can use _NoteDb_ to store almost all this data which
simplifies creating backups since consistency between database and git
repositories is no longer critical. If you migrated to _NoteDb_ you can
follow the backup procedure for 3.0 and higher and additionally take
a backup of the database, which only contains the schema version,
hence consistency between git and database is no longer critical since
the schema version only changes during upgrade. If you didn't migrate
to _NoteDb_ then follow the backup procedure for older 2.x Gerrit versions.
Older 2.x Gerrit versions store change meta data, review comments, votes,
accounts and group information in a SQL database. Creating consistent backups
where git repositories and the data stored in the database are backed up
consistently requires to turn the server read-only or to shut it down
while creating the backup since there is no integrated transaction handling
between git repositories and the SQL database. Also crons and currently
running cron jobs (e.g. repacking repositories) which affect the repositories
may need to be shut down.
Use a file system supporting snapshots to keep the period where the gerrit
server is read-only or down as short as possible.
=== Turn primary server read-only for backup
Make the primary server handling write operations read-only before taking the
backup. This means read-access is still available from replica servers during
backup, because only write operations have to be stopped to ensure consistency.
This can be implemented using the
link:[_readonly_,role=external,window=_blank] plugin.
=== Replicate data for backup
Replicating the git repositories can backup the most critical repository data
but does not backup repository meta-data such as the project description
file, ref-logs, git configs, and alternate configs.
Replicate all git repositories to another file system using
`git clone --mirror`,
or the
link:[replication plugin,role=external,window=_blank]
or the
link:[pull-replication plugin,role=external,window=_blank].
Best you use a filesystem supporting snapshots to create a backup archive
of such a replica.
For 2.x Gerrit versions also set up a database replica for the data stored in the
SQL database. If you are using 2.16 and migrated to _NoteDb_ you may consider to
skip setting up a database replica, instead take a backup of the database which only
contains the current schema version in this case.
In addition you need to ensure that no write operations are in flight before you
take the replica offline. Otherwise the database backup might be inconsistent
with the backup of the git repositories.
Do not skip backing up the replica, the replica alone IS NOT a backup.
Imagine someone deleted a project by mistake and this deletion got replicated.
Replication of repository deletions can be switched off using the
link:[server option,role=external,window=_blank]
If you are using Gerrit replica to offload read traffic you can use one of these
replica for creating backups.
=== Take primary server offline for backup
Shut down the primary server handling write operations before taking a backup.
This is simple but means downtime for the users. Also crons and currently
running cron jobs (e.g. repacking repositories) which affect the repositories
may need to be shut down.
== Backup methods
=== Filesystem snapshots
Filesystems supporting copy on write snapshots::
Use a file system supporting copy-on-write snapshots like
Other filesystems supporting snapshots::[lvm,role=external,window=_blank] or nfs.
Create a snapshot and then archive the snapshot to another storage.
While snapshots are great for creating high quality backups quickly, they are
not ideal as a format for storing backup data. Snapshots typically depend and
reside on the same storage infrastructure as the original disk images.
Therefore, it’s crucial that you archive these snapshots and store them
3.0 or newer::
Snapshot the complete site directory
Similar, but the data of the database should be stored on the very same volume
on the same machine, so that the snapshot is taken atomically over both
the git data and the database data. Because everything should be ACID, it can safely
crash-recover - as if the power has been plugged and the server got booted up again.
(Actually more safe than that, because the filesystem knows about taking the snapshot,
and also about the pending writes it can sync.)
In addition to that, using filesystem snapshots allows to:
* easy and fast roll back without having to access remote backup data (e.g. to restore
accidental rm -rf git/ back in seconds).
* incremental transfer of consistent snapshots
* save a lot of data while still keeping multiple "known consistent states"
=== Other backup methods
To ensure consistent backups these backup methods require to turn the server into
read-only mode while a backup is running.
* create an archive like `tar.gz` to backup the site
* `rsync`
* plain old `cp`
== Test backups
Test backups and fire drill restoring backups to ensure the backups aren't
corrupt or incomplete and you can restore a backup quickly.
== Disaster recovery
=== Replicate backup archives
To enable disaster recovery at least replicate backup archives to another data center.
And fire drill restoring a new site using the backup.
=== Multi-site setup
Use the[multi-site plugin,role=external,window=_blank]
to install Gerrit with multiple sites installed in different datacenters
across different regions. This ensures that in case of a severe problem with
one of the sites, the other sites can still serve your repositories.
Part of link:index.html[Gerrit Code Review]