Marian Harbach | ebeb154 | 2019-12-13 10:42:46 +0100 | [diff] [blame] | 1 | :linkattrs: |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 2 | = Gerrit Code Review - Backup |
| 3 | |
| 4 | A Gerrit Code Review site contains data that needs to be backed up regularly. |
| 5 | This document describes best practices for backing up review data. |
| 6 | |
| 7 | [#mand-backup] |
| 8 | == Data which must be backed up |
| 9 | |
| 10 | [#mand-backup-git] |
| 11 | Git repositories:: |
| 12 | + |
| 13 | The bare Git repositories managed by Gerrit are typically stored in the |
| 14 | `${SITE}/git` directory. However, the locations can be customized in |
| 15 | `${site}/etc/gerrit.config`. They contain the history of the respective |
Edwin Kempin | 3df01b0 | 2019-12-06 13:36:44 +0100 | [diff] [blame] | 16 | projects, and since 2.15 if you are using _NoteDb_, and for 3.0 and newer, |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 17 | also change and review metadata, user accounts and groups. |
| 18 | + |
| 19 | |
| 20 | [#mand-backup-db] |
| 21 | SQL database:: |
| 22 | + |
| 23 | Gerrit releases in the 2.x series store some data in the database you |
| 24 | have chosen when installing Gerrit. If you are using 2.16 and have |
Edwin Kempin | 3df01b0 | 2019-12-06 13:36:44 +0100 | [diff] [blame] | 25 | migrated to _NoteDb_ only the schema version is stored in the database. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 26 | + |
| 27 | If you are using h2 you need to backup the `.db` files in the folder |
| 28 | `${SITE}/db`. |
| 29 | + |
| 30 | For all other database types refer to their backup documentation. |
| 31 | + |
Edwin Kempin | 3df01b0 | 2019-12-06 13:36:44 +0100 | [diff] [blame] | 32 | Gerrit release 3.0 and newer store all primary data in _NoteDb_ inside |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 33 | the git repositories of the Gerrit site. Only the review flag marking in |
| 34 | the UI when you have reviewed a changed file is stored in a relational |
| 35 | database. If you are using h2 this database is named |
| 36 | `account_patch_reviews.h2.db`. |
| 37 | |
| 38 | [#optional-backup] |
| 39 | == Data optional to be backed up |
| 40 | |
| 41 | [#data-optional-backup-index] |
| 42 | Search index:: |
| 43 | + |
| 44 | The _Lucene_ search index is stored in the `${SITE}/index` folder. |
| 45 | It can be recomputed from primary data in the git repositories but |
| 46 | reindexing may take a long time hence backing up the index makes sense |
| 47 | for production installations. |
| 48 | + |
| 49 | If you have chosen to use _Elastic Search_ for indexing, |
| 50 | refer to its |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 51 | link:https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html[backup documentation,role=external,window=_blank]. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 52 | |
| 53 | [#optional-backup-cache] |
| 54 | Caches:: |
| 55 | + |
| 56 | Gerrit uses many caches which populate automatically. Some of the caches |
| 57 | are persisted in the directory `${SITE}/cache` to retain the cached data |
| 58 | across restarts. Since repopulating persistent caches takes time and server |
| 59 | resources it makes sense to include them in backups to avoid unnecessary |
| 60 | higher load and degraded performance when a Gerrit site has been restored |
| 61 | from backup and caches need to be repopulated. |
| 62 | |
| 63 | [#optional-backup-config] |
| 64 | Configuration:: |
| 65 | + |
| 66 | Gerrit configuration files are located in the directory `${SITE}/etc` |
| 67 | and should be backed up or versioned in a git repository. The `etc` |
| 68 | directory also contains secrets which should be handled separately |
| 69 | + |
| 70 | * `secure.config` contains passwords and `auth.registerEmailPrivateKey` |
| 71 | * public and private SSH host keys |
| 72 | + |
| 73 | You may consider to use the |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 74 | link:https://gerrit.googlesource.com/plugins/secure-config/[secure-config plugin,role=external,window=_blank] |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 75 | to encrypt these secrets. |
| 76 | |
| 77 | [#optional-backup-plugin-data] |
| 78 | Plugin Data:: |
| 79 | + |
| 80 | The `${SITE}/data/` directory is used by plugins storing data like e.g. |
| 81 | the delete-project and the replication plugin. |
| 82 | |
| 83 | [#optional-backup-libs] |
| 84 | Libraries:: |
| 85 | + |
| 86 | The `${SITE}/lib/` directory contains libraries used as statically loaded |
| 87 | plugin or providing additional dependencies needed by Gerrit plugins. |
| 88 | |
| 89 | [#optional-backup-plugins] |
| 90 | Plugins:: |
| 91 | + |
| 92 | The `${SITE}/plugins/` directory contains the installed Gerrit plugins. |
| 93 | |
| 94 | [#optional-backup-static] |
| 95 | Static Resources:: |
| 96 | + |
| 97 | The `${SITE}/static/` directory contains static resources used to customize the |
| 98 | Gerrit UI and email templates. |
| 99 | |
| 100 | [#optional-backup-logs] |
| 101 | Logs:: |
| 102 | + |
| 103 | The `${SITE}/logs/` directory contains Gerrit server log files. Logs can still |
| 104 | be written when the server is in read-only mode. |
| 105 | |
| 106 | [#cons-backup] |
| 107 | == Consistent backups |
| 108 | |
| 109 | There are several ways to ensure consistency when backing up primary data. |
| 110 | |
| 111 | [#cons-backup-snapshot] |
| 112 | === Filesystem snapshots |
| 113 | |
| 114 | Gerrit 3.0 or newer:: |
| 115 | + |
| 116 | * all primary data is stored in git |
| 117 | * Use a file system like lvm, zfs, btrfs or nfs supporting snapshots. |
| 118 | Create a snapshot and then archive the snapshot. |
| 119 | |
| 120 | Gerrit 2.x:: |
| 121 | + |
Edwin Kempin | 3df01b0 | 2019-12-06 13:36:44 +0100 | [diff] [blame] | 122 | Gerrit 2.16 can use _NoteDb_ to store almost all this data which |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 123 | simplifies creating backups since consistency between database and git |
Edwin Kempin | 3df01b0 | 2019-12-06 13:36:44 +0100 | [diff] [blame] | 124 | repositories is no longer critical. If you migrated to _NoteDb_ you can |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 125 | follow the backup procedure for 3.0 and higher and additionally take |
| 126 | a backup of the database, which only contains the schema version, |
| 127 | hence consistency between git and database is no longer critical since |
| 128 | the schema version only changes during upgrade. If you didn't migrate |
Edwin Kempin | 3df01b0 | 2019-12-06 13:36:44 +0100 | [diff] [blame] | 129 | to _NoteDb_ then follow the backup procedure for older 2.x Gerrit versions. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 130 | + |
| 131 | Older 2.x Gerrit versions store change meta data, review comments, votes, |
| 132 | accounts and group information in a SQL database. Creating consistent backups |
| 133 | where git repositories and the data stored in the database are backed up |
| 134 | consistently requires to turn the server read-only or to shut it down |
| 135 | while creating the backup since there is no integrated transaction handling |
| 136 | between git repositories and the SQL database. Also crons and currently |
| 137 | running cron jobs (e.g. repacking repositories) which affect the repositories |
| 138 | may need to be shut down. |
| 139 | Use a file system supporting snapshots to keep the period where the gerrit |
| 140 | server is read-only or down as short as possible. |
| 141 | |
| 142 | [#cons-backup-read-only] |
Matthias Sohn | d8182ba | 2019-12-09 14:50:23 +0100 | [diff] [blame] | 143 | === Turn primary server read-only for backup |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 144 | |
Matthias Sohn | d8182ba | 2019-12-09 14:50:23 +0100 | [diff] [blame] | 145 | Make the primary server handling write operations read-only before taking the |
| 146 | backup. This means read-access is still available from replica servers during |
| 147 | backup, because only write operations have to be stopped to ensure consistency. |
| 148 | This can be implemented using the |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 149 | link:https://gerrit.googlesource.com/plugins/readonly/[_readonly_,role=external,window=_blank] plugin. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 150 | |
| 151 | [#cons-backup-replicate] |
| 152 | === Replicate data for backup |
| 153 | |
| 154 | Replicating the git repositories can backup the most critical repository data |
| 155 | but does not backup repository meta-data such as the project description |
| 156 | file, ref-logs, git configs, and alternate configs. |
| 157 | |
| 158 | Replicate all git repositories to another file system using |
| 159 | `git clone --mirror`, |
| 160 | or the |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 161 | link:https://gerrit.googlesource.com/plugins/replication[replication plugin,role=external,window=_blank] |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 162 | or the |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 163 | link:https://gerrit.googlesource.com/plugins/pull-replication[pull-replication plugin,role=external,window=_blank]. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 164 | Best you use a filesystem supporting snapshots to create a backup archive |
| 165 | of such a replica. |
| 166 | |
Matthias Sohn | 7aba094 | 2019-12-09 11:01:37 +0100 | [diff] [blame] | 167 | For 2.x Gerrit versions also set up a database replica for the data stored in the |
Edwin Kempin | 3df01b0 | 2019-12-06 13:36:44 +0100 | [diff] [blame] | 168 | SQL database. If you are using 2.16 and migrated to _NoteDb_ you may consider to |
Matthias Sohn | 7aba094 | 2019-12-09 11:01:37 +0100 | [diff] [blame] | 169 | skip setting up a database replica, instead take a backup of the database which only |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 170 | contains the current schema version in this case. |
| 171 | In addition you need to ensure that no write operations are in flight before you |
| 172 | take the replica offline. Otherwise the database backup might be inconsistent |
| 173 | with the backup of the git repositories. |
| 174 | |
| 175 | Do not skip backing up the replica, the replica alone IS NOT a backup. |
| 176 | Imagine someone deleted a project by mistake and this deletion got replicated. |
| 177 | Replication of repository deletions can be switched off using the |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 178 | link:https://gerrit.googlesource.com/plugins/replication/+/refs/heads/master/src/main/resources/Documentation/config.md[server option,role=external,window=_blank] |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 179 | `remote.NAME.replicateProjectDeletions`. |
| 180 | |
Matthias Sohn | 7aba094 | 2019-12-09 11:01:37 +0100 | [diff] [blame] | 181 | If you are using Gerrit replica to offload read traffic you can use one of these |
| 182 | replica for creating backups. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 183 | |
| 184 | [#cons-backup-offline] |
Matthias Sohn | d8182ba | 2019-12-09 14:50:23 +0100 | [diff] [blame] | 185 | === Take primary server offline for backup |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 186 | |
Matthias Sohn | d8182ba | 2019-12-09 14:50:23 +0100 | [diff] [blame] | 187 | Shut down the primary server handling write operations before taking a backup. |
| 188 | This is simple but means downtime for the users. Also crons and currently |
| 189 | running cron jobs (e.g. repacking repositories) which affect the repositories |
| 190 | may need to be shut down. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 191 | |
| 192 | [#backup-methods] |
| 193 | == Backup methods |
| 194 | |
| 195 | [#backup-methods-snapshots] |
| 196 | === Filesystem snapshots |
| 197 | |
| 198 | Filesystems supporting copy on write snapshots:: |
| 199 | + |
| 200 | Use a file system supporting copy-on-write snapshots like |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 201 | link:https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Snapshots[btrfs,role=external,window=_blank] |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 202 | or |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 203 | https://wiki.debian.org/ZFS#Snapshots[zfs,role=external,window=_blank]. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 204 | |
| 205 | |
| 206 | Other filesystems supporting snapshots:: |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 207 | https://wiki.archlinux.org/index.php/LVM#Snapshots[lvm,role=external,window=_blank] or nfs. |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 208 | + |
| 209 | Create a snapshot and then archive the snapshot to another storage. |
| 210 | + |
| 211 | While snapshots are great for creating high quality backups quickly, they are |
| 212 | not ideal as a format for storing backup data. Snapshots typically depend and |
| 213 | reside on the same storage infrastructure as the original disk images. |
| 214 | Therefore, it’s crucial that you archive these snapshots and store them |
| 215 | elsewhere. |
| 216 | |
| 217 | 3.0 or newer:: |
| 218 | Snapshot the complete site directory |
| 219 | |
| 220 | 2.x:: |
| 221 | Similar, but the data of the database should be stored on the very same volume |
| 222 | on the same machine, so that the snapshot is taken atomically over both |
| 223 | the git data and the database data. Because everything should be ACID, it can safely |
| 224 | crash-recover - as if the power has been plugged and the server got booted up again. |
| 225 | (Actually more safe than that, because the filesystem knows about taking the snapshot, |
| 226 | and also about the pending writes it can sync.) |
| 227 | |
| 228 | In addition to that, using filesystem snapshots allows to: |
| 229 | |
| 230 | * easy and fast roll back without having to access remote backup data (e.g. to restore |
| 231 | accidental rm -rf git/ back in seconds). |
| 232 | * incremental transfer of consistent snapshots |
| 233 | * save a lot of data while still keeping multiple "known consistent states" |
| 234 | |
| 235 | [#backup-methods-other] |
| 236 | === Other backup methods |
| 237 | |
| 238 | To ensure consistent backups these backup methods require to turn the server into |
| 239 | read-only mode while a backup is running. |
| 240 | |
| 241 | * create an archive like `tar.gz` to backup the site |
| 242 | * `rsync` |
| 243 | * plain old `cp` |
| 244 | |
| 245 | [#backup-methods-test] |
| 246 | == Test backups |
| 247 | |
| 248 | Test backups and fire drill restoring backups to ensure the backups aren't |
| 249 | corrupt or incomplete and you can restore a backup quickly. |
| 250 | |
| 251 | [#backup-dr] |
| 252 | == Disaster recovery |
| 253 | |
| 254 | [#backup-dr-repl] |
| 255 | === Replicate backup archives |
| 256 | |
| 257 | To enable disaster recovery at least replicate backup archives to another data center. |
| 258 | And fire drill restoring a new site using the backup. |
| 259 | |
| 260 | [#backup-dr-multi-site] |
| 261 | === Multi-site setup |
| 262 | |
Marian Harbach | 3425337 | 2019-12-10 18:01:31 +0100 | [diff] [blame] | 263 | Use the https://gerrit.googlesource.com/plugins/multi-site[multi-site plugin,role=external,window=_blank] |
Matthias Sohn | 0d21356 | 2019-11-29 02:01:56 +0100 | [diff] [blame] | 264 | to install Gerrit with multiple sites installed in different datacenters |
| 265 | across different regions. This ensures that in case of a severe problem with |
| 266 | one of the sites, the other sites can still serve your repositories. |
| 267 | |
| 268 | GERRIT |
| 269 | ------ |
| 270 | Part of link:index.html[Gerrit Code Review] |
| 271 | |
| 272 | SEARCHBOX |
| 273 | --------- |