blob: dd470355dfc19c87bddb07f05f231cc1ecf0f9b6 [file] [log] [blame]
Marian Harbachebeb1542019-12-13 10:42:46 +01001:linkattrs:
Matthias Sohn0d213562019-11-29 02:01:56 +01002= Gerrit Code Review - Backup
3
4A Gerrit Code Review site contains data that needs to be backed up regularly.
5This document describes best practices for backing up review data.
6
7[#mand-backup]
8== Data which must be backed up
9
10[#mand-backup-git]
11Git repositories::
12+
13The bare Git repositories managed by Gerrit are typically stored in the
14`${SITE}/git` directory. However, the locations can be customized in
15`${site}/etc/gerrit.config`. They contain the history of the respective
Edwin Kempin3df01b02019-12-06 13:36:44 +010016projects, and since 2.15 if you are using _NoteDb_, and for 3.0 and newer,
Matthias Sohn0d213562019-11-29 02:01:56 +010017also change and review metadata, user accounts and groups.
18+
19
20[#mand-backup-db]
21SQL database::
22+
23Gerrit releases in the 2.x series store some data in the database you
24have chosen when installing Gerrit. If you are using 2.16 and have
Edwin Kempin3df01b02019-12-06 13:36:44 +010025migrated to _NoteDb_ only the schema version is stored in the database.
Matthias Sohn0d213562019-11-29 02:01:56 +010026+
27If you are using h2 you need to backup the `.db` files in the folder
28`${SITE}/db`.
29+
30For all other database types refer to their backup documentation.
31+
Edwin Kempin3df01b02019-12-06 13:36:44 +010032Gerrit release 3.0 and newer store all primary data in _NoteDb_ inside
Matthias Sohn0d213562019-11-29 02:01:56 +010033the git repositories of the Gerrit site. Only the review flag marking in
34the UI when you have reviewed a changed file is stored in a relational
35database. If you are using h2 this database is named
36`account_patch_reviews.h2.db`.
37
38[#optional-backup]
39== Data optional to be backed up
40
41[#data-optional-backup-index]
42Search index::
43+
44The _Lucene_ search index is stored in the `${SITE}/index` folder.
45It can be recomputed from primary data in the git repositories but
46reindexing may take a long time hence backing up the index makes sense
47for production installations.
48+
49If you have chosen to use _Elastic Search_ for indexing,
50refer to its
Marian Harbach34253372019-12-10 18:01:31 +010051link:https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html[backup documentation,role=external,window=_blank].
Matthias Sohn0d213562019-11-29 02:01:56 +010052
53[#optional-backup-cache]
54Caches::
55+
56Gerrit uses many caches which populate automatically. Some of the caches
57are persisted in the directory `${SITE}/cache` to retain the cached data
58across restarts. Since repopulating persistent caches takes time and server
59resources it makes sense to include them in backups to avoid unnecessary
60higher load and degraded performance when a Gerrit site has been restored
61from backup and caches need to be repopulated.
62
63[#optional-backup-config]
64Configuration::
65+
66Gerrit configuration files are located in the directory `${SITE}/etc`
67and should be backed up or versioned in a git repository. The `etc`
68directory also contains secrets which should be handled separately
69+
70* `secure.config` contains passwords and `auth.registerEmailPrivateKey`
71* public and private SSH host keys
72+
73You may consider to use the
Marian Harbach34253372019-12-10 18:01:31 +010074link:https://gerrit.googlesource.com/plugins/secure-config/[secure-config plugin,role=external,window=_blank]
Matthias Sohn0d213562019-11-29 02:01:56 +010075to encrypt these secrets.
76
77[#optional-backup-plugin-data]
78Plugin Data::
79+
80The `${SITE}/data/` directory is used by plugins storing data like e.g.
81the delete-project and the replication plugin.
82
83[#optional-backup-libs]
84Libraries::
85+
86The `${SITE}/lib/` directory contains libraries used as statically loaded
87plugin or providing additional dependencies needed by Gerrit plugins.
88
89[#optional-backup-plugins]
90Plugins::
91+
92The `${SITE}/plugins/` directory contains the installed Gerrit plugins.
93
94[#optional-backup-static]
95Static Resources::
96+
97The `${SITE}/static/` directory contains static resources used to customize the
98Gerrit UI and email templates.
99
100[#optional-backup-logs]
101Logs::
102+
103The `${SITE}/logs/` directory contains Gerrit server log files. Logs can still
104be written when the server is in read-only mode.
105
106[#cons-backup]
107== Consistent backups
108
109There are several ways to ensure consistency when backing up primary data.
110
111[#cons-backup-snapshot]
112=== Filesystem snapshots
113
114Gerrit 3.0 or newer::
115+
116* all primary data is stored in git
117* Use a file system like lvm, zfs, btrfs or nfs supporting snapshots.
118Create a snapshot and then archive the snapshot.
119
120Gerrit 2.x::
121+
Edwin Kempin3df01b02019-12-06 13:36:44 +0100122Gerrit 2.16 can use _NoteDb_ to store almost all this data which
Matthias Sohn0d213562019-11-29 02:01:56 +0100123simplifies creating backups since consistency between database and git
Edwin Kempin3df01b02019-12-06 13:36:44 +0100124repositories is no longer critical. If you migrated to _NoteDb_ you can
Matthias Sohn0d213562019-11-29 02:01:56 +0100125follow the backup procedure for 3.0 and higher and additionally take
126a backup of the database, which only contains the schema version,
127hence consistency between git and database is no longer critical since
128the schema version only changes during upgrade. If you didn't migrate
Edwin Kempin3df01b02019-12-06 13:36:44 +0100129to _NoteDb_ then follow the backup procedure for older 2.x Gerrit versions.
Matthias Sohn0d213562019-11-29 02:01:56 +0100130+
131Older 2.x Gerrit versions store change meta data, review comments, votes,
132accounts and group information in a SQL database. Creating consistent backups
133where git repositories and the data stored in the database are backed up
134consistently requires to turn the server read-only or to shut it down
135while creating the backup since there is no integrated transaction handling
136between git repositories and the SQL database. Also crons and currently
137running cron jobs (e.g. repacking repositories) which affect the repositories
138may need to be shut down.
139Use a file system supporting snapshots to keep the period where the gerrit
140server is read-only or down as short as possible.
141
142[#cons-backup-read-only]
Matthias Sohnd8182ba2019-12-09 14:50:23 +0100143=== Turn primary server read-only for backup
Matthias Sohn0d213562019-11-29 02:01:56 +0100144
Matthias Sohnd8182ba2019-12-09 14:50:23 +0100145Make the primary server handling write operations read-only before taking the
146backup. This means read-access is still available from replica servers during
147backup, because only write operations have to be stopped to ensure consistency.
148This can be implemented using the
Marian Harbach34253372019-12-10 18:01:31 +0100149link:https://gerrit.googlesource.com/plugins/readonly/[_readonly_,role=external,window=_blank] plugin.
Matthias Sohn0d213562019-11-29 02:01:56 +0100150
151[#cons-backup-replicate]
152=== Replicate data for backup
153
154Replicating the git repositories can backup the most critical repository data
155but does not backup repository meta-data such as the project description
156file, ref-logs, git configs, and alternate configs.
157
158Replicate all git repositories to another file system using
159`git clone --mirror`,
160or the
Marian Harbach34253372019-12-10 18:01:31 +0100161link:https://gerrit.googlesource.com/plugins/replication[replication plugin,role=external,window=_blank]
Matthias Sohn0d213562019-11-29 02:01:56 +0100162or the
Marian Harbach34253372019-12-10 18:01:31 +0100163link:https://gerrit.googlesource.com/plugins/pull-replication[pull-replication plugin,role=external,window=_blank].
Matthias Sohn0d213562019-11-29 02:01:56 +0100164Best you use a filesystem supporting snapshots to create a backup archive
165of such a replica.
166
Matthias Sohn7aba0942019-12-09 11:01:37 +0100167For 2.x Gerrit versions also set up a database replica for the data stored in the
Edwin Kempin3df01b02019-12-06 13:36:44 +0100168SQL database. If you are using 2.16 and migrated to _NoteDb_ you may consider to
Matthias Sohn7aba0942019-12-09 11:01:37 +0100169skip setting up a database replica, instead take a backup of the database which only
Matthias Sohn0d213562019-11-29 02:01:56 +0100170contains the current schema version in this case.
171In addition you need to ensure that no write operations are in flight before you
172take the replica offline. Otherwise the database backup might be inconsistent
173with the backup of the git repositories.
174
175Do not skip backing up the replica, the replica alone IS NOT a backup.
176Imagine someone deleted a project by mistake and this deletion got replicated.
177Replication of repository deletions can be switched off using the
Marian Harbach34253372019-12-10 18:01:31 +0100178link:https://gerrit.googlesource.com/plugins/replication/+/refs/heads/master/src/main/resources/Documentation/config.md[server option,role=external,window=_blank]
Matthias Sohn0d213562019-11-29 02:01:56 +0100179`remote.NAME.replicateProjectDeletions`.
180
Matthias Sohn7aba0942019-12-09 11:01:37 +0100181If you are using Gerrit replica to offload read traffic you can use one of these
182replica for creating backups.
Matthias Sohn0d213562019-11-29 02:01:56 +0100183
184[#cons-backup-offline]
Matthias Sohnd8182ba2019-12-09 14:50:23 +0100185=== Take primary server offline for backup
Matthias Sohn0d213562019-11-29 02:01:56 +0100186
Matthias Sohnd8182ba2019-12-09 14:50:23 +0100187Shut down the primary server handling write operations before taking a backup.
188This is simple but means downtime for the users. Also crons and currently
189running cron jobs (e.g. repacking repositories) which affect the repositories
190may need to be shut down.
Matthias Sohn0d213562019-11-29 02:01:56 +0100191
192[#backup-methods]
193== Backup methods
194
195[#backup-methods-snapshots]
196=== Filesystem snapshots
197
198Filesystems supporting copy on write snapshots::
199+
200Use a file system supporting copy-on-write snapshots like
Marian Harbach34253372019-12-10 18:01:31 +0100201link:https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Snapshots[btrfs,role=external,window=_blank]
Matthias Sohn0d213562019-11-29 02:01:56 +0100202or
Marian Harbach34253372019-12-10 18:01:31 +0100203https://wiki.debian.org/ZFS#Snapshots[zfs,role=external,window=_blank].
Matthias Sohn0d213562019-11-29 02:01:56 +0100204
205
206Other filesystems supporting snapshots::
Marian Harbach34253372019-12-10 18:01:31 +0100207https://wiki.archlinux.org/index.php/LVM#Snapshots[lvm,role=external,window=_blank] or nfs.
Matthias Sohn0d213562019-11-29 02:01:56 +0100208+
209Create a snapshot and then archive the snapshot to another storage.
210+
211While snapshots are great for creating high quality backups quickly, they are
212not ideal as a format for storing backup data. Snapshots typically depend and
213reside on the same storage infrastructure as the original disk images.
214Therefore, it’s crucial that you archive these snapshots and store them
215elsewhere.
216
2173.0 or newer::
218Snapshot the complete site directory
219
2202.x::
221Similar, but the data of the database should be stored on the very same volume
222on the same machine, so that the snapshot is taken atomically over both
223the git data and the database data. Because everything should be ACID, it can safely
224crash-recover - as if the power has been plugged and the server got booted up again.
225(Actually more safe than that, because the filesystem knows about taking the snapshot,
226and also about the pending writes it can sync.)
227
228In addition to that, using filesystem snapshots allows to:
229
230* easy and fast roll back without having to access remote backup data (e.g. to restore
231accidental rm -rf git/ back in seconds).
232* incremental transfer of consistent snapshots
233* save a lot of data while still keeping multiple "known consistent states"
234
235[#backup-methods-other]
236=== Other backup methods
237
238To ensure consistent backups these backup methods require to turn the server into
239read-only mode while a backup is running.
240
241* create an archive like `tar.gz` to backup the site
242* `rsync`
243* plain old `cp`
244
245[#backup-methods-test]
246== Test backups
247
248Test backups and fire drill restoring backups to ensure the backups aren't
249corrupt or incomplete and you can restore a backup quickly.
250
251[#backup-dr]
252== Disaster recovery
253
254[#backup-dr-repl]
255=== Replicate backup archives
256
257To enable disaster recovery at least replicate backup archives to another data center.
258And fire drill restoring a new site using the backup.
259
260[#backup-dr-multi-site]
261=== Multi-site setup
262
Marian Harbach34253372019-12-10 18:01:31 +0100263Use the https://gerrit.googlesource.com/plugins/multi-site[multi-site plugin,role=external,window=_blank]
Matthias Sohn0d213562019-11-29 02:01:56 +0100264to install Gerrit with multiple sites installed in different datacenters
265across different regions. This ensures that in case of a severe problem with
266one of the sites, the other sites can still serve your repositories.
267
268GERRIT
269------
270Part of link:index.html[Gerrit Code Review]
271
272SEARCHBOX
273---------