Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 1 | = Gerrit Code Review - Accounts |
| 2 | |
| 3 | == Overview |
| 4 | |
| 5 | Starting from 2.15 Gerrit accounts are fully stored in |
Gert van Dijk | 570ac7b | 2017-10-17 22:36:04 +0200 | [diff] [blame] | 6 | link:note-db.html[NoteDb]. |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 7 | |
| 8 | The account data consists of a sequence number (account ID), account |
| 9 | properties (full name, preferred email, registration date, status, |
| 10 | inactive flag), preferences (general, diff and edit preferences), |
| 11 | project watches, SSH keys, external IDs, starred changes and reviewed |
| 12 | flags. |
| 13 | |
| 14 | Most account data is stored in a special link:#all-users[All-Users] |
| 15 | repository, which has one branch per user. Within the user branch there |
| 16 | are Git config files for the link:#account-properties[ |
| 17 | account properties], the link:#preferences[account preferences] and the |
| 18 | link:#project-watches[project watches]. In addition there is an |
| 19 | `authorized_keys` file for the link:#ssh-keys[SSH keys] that follows |
| 20 | the standard OpenSSH file format. |
| 21 | |
| 22 | The account data in the user branch is versioned and the Git history of |
| 23 | this branch serves as an audit log. |
| 24 | |
| 25 | The link:#external-ids[external IDs] are stored as Git Notes inside the |
| 26 | `All-Users` repository in the `refs/meta/external-ids` notes branch. |
| 27 | Storing all external IDs in a notes branch ensures that each external |
| 28 | ID is only used once. |
| 29 | |
| 30 | The link:#starred-changes[starred changes] are represented as |
| 31 | independent refs in the `All-Users` repository. They are not stored in |
| 32 | the user branch, since this data doesn't need versioning. |
| 33 | |
| 34 | The link:#reviewed-flags[reviewed flags] are not stored in Git, but are |
| 35 | persisted in a database table. This is because there is a high volume |
| 36 | of reviewed flags and storing them in Git would be inefficient. |
| 37 | |
| 38 | Since accessing the account data in Git is not fast enough for account |
| 39 | queries, e.g. when suggesting reviewers, Gerrit has a |
| 40 | link:#account-index[secondary index for accounts]. |
| 41 | |
| 42 | [[all-users]] |
| 43 | == `All-Users` repository |
| 44 | |
| 45 | The `All-Users` repository is a special repository that only contains |
| 46 | user-specific information. It contains one branch per user. The user |
| 47 | branch is formatted as `refs/users/CD/ABCD`, where `CD/ABCD` is the |
| 48 | link:access-control.html#sharded-user-id[sharded account ID], e.g. the |
| 49 | user branch for account `1000856` is `refs/users/56/1000856`. The |
| 50 | account IDs in the user refs are sharded so that there is a good |
| 51 | distribution of the Git data in the storage system. |
| 52 | |
| 53 | A user branch must exist for each account, as it represents the |
| 54 | account. The files in the user branch are all optional. This means |
| 55 | having a user branch with a tree that is completely empty is also a |
| 56 | valid account definition. |
| 57 | |
| 58 | Updates to the user branch are done through the |
| 59 | link:rest-api-accounts.html[Gerrit REST API], but users can also |
| 60 | manually fetch their user branch and push changes back to Gerrit. On |
| 61 | push the user data is evaluated and invalid user data is rejected. |
| 62 | |
| 63 | To hide the implementation detail of the sharded account ID in the ref |
| 64 | name Gerrit offers a magic `refs/users/self` ref that is automatically |
| 65 | resolved to the user branch of the calling user. The user can then use |
| 66 | this ref to fetch from and push to the own user branch. E.g. if user |
| 67 | `1000856` pushes to `refs/users/self`, the branch |
| 68 | `refs/users/56/1000856` is updated. In Gerrit `self` is an established |
| 69 | term to refer to the calling user (e.g. in change queries). This is why |
| 70 | the magic ref for the own user branch is called `refs/users/self`. |
| 71 | |
| 72 | A user branch should only be readable and writeable by the user to whom |
| 73 | the account belongs. To assign permissions on the user branches the |
| 74 | normal branch permission system is used. In the permission system the |
| 75 | user branches are specified as `refs/users/${shardeduserid}`. The |
| 76 | `${shardeduserid}` variable is resolved to the sharded account ID. This |
| 77 | variable is used to assign default access rights on all user branches |
| 78 | that apply only to the owning user. The following permissions are set |
| 79 | by default when a Gerrit site is newly installed or upgraded to a |
| 80 | version which supports user branches: |
| 81 | |
| 82 | .All-Users project.config |
| 83 | ---- |
| 84 | [access "refs/users/${shardeduserid}"] |
| 85 | exclusiveGroupPermissions = read push submit |
| 86 | read = group Registered Users |
| 87 | push = group Registered Users |
| 88 | label-Code-Review = -2..+2 group Registered Users |
| 89 | submit = group Registered Users |
| 90 | ---- |
| 91 | |
| 92 | The user branch contains several files with account data which are |
| 93 | described link:#account-data-in-user-branch[below]. |
| 94 | |
| 95 | In addition to the user branches the `All-Users` repository also |
| 96 | contains a branch for the link:#external-ids[external IDs] and special |
| 97 | refs for the link:#starred-changes[starred changes]. |
| 98 | |
| 99 | Also the next available value of the link:#account-sequence[account |
| 100 | sequence] is stored in the `All-Users` repository. |
| 101 | |
| 102 | [[account-index]] |
| 103 | == Account Index |
| 104 | |
| 105 | There are several situations in which Gerrit needs to query accounts, |
| 106 | e.g.: |
| 107 | |
| 108 | * For sending email notifications to project watchers. |
| 109 | * For reviewer suggestions. |
| 110 | |
| 111 | Accessing the account data in Git is not fast enough for account |
| 112 | queries, since it requires accessing all user branches and parsing |
| 113 | all files in each of them. To overcome this Gerrit has a secondary |
| 114 | index for accounts. The account index is either based on |
| 115 | link:config-gerrit.html#index.type[Lucene or Elasticsearch]. |
| 116 | |
| 117 | Via the link:rest-api-accounts.html#query-account[Query Account] REST |
| 118 | endpoint link:user-search-accounts.html[generic account queries] are |
| 119 | supported. |
| 120 | |
| 121 | Accounts are automatically reindexed on any update. The |
| 122 | link:rest-api-accounts.html#index-account[Index Account] REST endpoint |
| 123 | allows to reindex an account manually. In addition the |
| 124 | link:pgm-reindex.html[reindex] program can be used to reindex all |
| 125 | accounts offline. |
| 126 | |
| 127 | [[account-data-in-user-branch]] |
| 128 | == Account Data in User Branch |
| 129 | |
| 130 | A user branch contains several Git config files with the account data: |
| 131 | |
| 132 | * `account.config`: |
| 133 | + |
| 134 | Stores the link:#account-properties[account properties]. |
| 135 | |
| 136 | * `preferences.config`: |
| 137 | + |
| 138 | Stores the link:#preferences[user preferences] of the account. |
| 139 | |
| 140 | * `watch.config`: |
| 141 | + |
| 142 | Stores the link:#project-watches[project watches] of the account. |
| 143 | |
| 144 | In addition it contains an |
| 145 | link:https://en.wikibooks.org/wiki/OpenSSH/Client_Configuration_Files#.7E.2F.ssh.2Fauthorized_keys[ |
| 146 | authorized_keys] file with the link:#ssh-keys[SSH keys] of the account. |
| 147 | |
| 148 | [[account-properties]] |
| 149 | === Account Properties |
| 150 | |
| 151 | The account properties are stored in the user branch in the |
| 152 | `account.config` file: |
| 153 | |
| 154 | ---- |
| 155 | [account] |
| 156 | fullName = John Doe |
| 157 | preferredEmail = john.doe@example.com |
| 158 | status = OOO |
| 159 | active = false |
| 160 | ---- |
| 161 | |
| 162 | For active accounts the `active` parameter can be omitted. |
| 163 | |
| 164 | The registration date is not contained in the `account.config` file but |
| 165 | is derived from the timestamp of the first commit on the user branch. |
| 166 | |
| 167 | When users update their account properties by pushing to the user |
| 168 | branch, it is verified that the preferred email exists in the external |
| 169 | IDs. |
| 170 | |
| 171 | Users are not allowed to flip the active value themselves; only |
| 172 | administrators and users with the |
| 173 | link:access-control.html#capability_modifyAccount[Modify Account] |
| 174 | global capability are allowed to change it. |
| 175 | |
| 176 | Since all data in the `account.config` file is optional the |
| 177 | `account.config` file may be absent from some user branches. |
| 178 | |
| 179 | [[preferences]] |
| 180 | === Preferences |
| 181 | |
| 182 | The account properties are stored in the user branch in the |
| 183 | `preferences.config` file. There are separate sections for |
| 184 | link:intro-user.html#preferences[general], |
| 185 | link:user-review-ui.html#diff-preferences[diff] and edit preferences: |
| 186 | |
| 187 | ---- |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 188 | [diff] |
| 189 | hideTopMenu = true |
| 190 | [edit] |
| 191 | lineLength = 80 |
| 192 | ---- |
| 193 | |
| 194 | The parameter names match the names that are used in the preferences REST API: |
| 195 | |
| 196 | * link:rest-api-accounts.html#preferences-info[General Preferences] |
| 197 | * link:rest-api-accounts.html#diff-preferences-info[Diff Preferences] |
| 198 | * link:rest-api-accounts.html#edit-preferences-info[Edit Preferences] |
| 199 | |
| 200 | If the value for a preference is the same as the default value for this |
David Pursehouse | 6d2ae28 | 2019-01-09 11:00:27 +0900 | [diff] [blame] | 201 | preference, it can be omitted in the `preferences.config` file. |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 202 | |
Edwin Kempin | 1e01692e | 2018-01-17 11:01:00 +0100 | [diff] [blame] | 203 | Defaults for preferences that apply for all accounts can be configured |
| 204 | in the `refs/users/default` branch in the `All-Users` repository. |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 205 | |
| 206 | [[project-watches]] |
| 207 | === Project Watches |
| 208 | |
| 209 | Users can configure watches on projects to receive email notifications |
| 210 | for changes of that project. |
| 211 | |
| 212 | A watch configuration consists of the project name and an optional |
| 213 | filter query. If a filter query is specified, email notifications will |
| 214 | be sent only for changes of that project that match this query. |
| 215 | |
| 216 | In addition, each watch configuration can contain a list of |
| 217 | notification types that determine for which events email notifications |
| 218 | should be sent. E.g. a user can configure that email notifications |
| 219 | should only be sent if a new patch set is uploaded and when the change |
| 220 | gets submitted, but not on other events. |
| 221 | |
| 222 | Project watches are stored in a `watch.config` file in the user branch: |
| 223 | |
| 224 | ---- |
| 225 | [project "foo"] |
| 226 | notify = * [ALL_COMMENTS] |
| 227 | notify = branch:master [ALL_COMMENTS, NEW_PATCHSETS] |
| 228 | notify = branch:master owner:self [SUBMITTED_CHANGES] |
| 229 | ---- |
| 230 | |
| 231 | The `watch.config` file has one project section for all project watches |
| 232 | of a project. The project name is used as subsection name and the |
| 233 | filters with the notification types, that decide for which events email |
| 234 | notifications should be sent, are represented as `notify` values in the |
| 235 | subsection. A `notify` value is formatted as |
| 236 | "<filter> [<comma-separated-list-of-notification-types>]". The |
| 237 | supported notification types are described in the |
| 238 | link:user-notify.html#notify.name.type[Email Notifications documentation]. |
| 239 | |
| 240 | For a change event, a notification will be sent if any `notify` value |
| 241 | of the corresponding project has both a filter that matches the change |
| 242 | and a notification type that matches the event. |
| 243 | |
| 244 | In order to send email notifications on change events, Gerrit needs to |
| 245 | find all accounts that watch the corresponding project. To make this |
| 246 | lookup fast the secondary account index is used. The account index |
| 247 | contains a repeated field that stores the projects that are being |
| 248 | watched by an account. After the accounts that watch the project have |
| 249 | been retrieved from the index, the complete watch configuration is |
| 250 | available from the account cache and Gerrit can check if any watch |
| 251 | matches the change and the event. |
| 252 | |
| 253 | [[ssh-keys]] |
| 254 | === SSH Keys |
| 255 | |
| 256 | SSH keys are stored in the user branch in an `authorized_keys` file, |
| 257 | which is the |
| 258 | link:https://en.wikibooks.org/wiki/OpenSSH/Client_Configuration_Files#.7E.2F.ssh.2Fauthorized_keys[ |
| 259 | standard OpenSSH file format] for storing SSH keys: |
| 260 | |
| 261 | ---- |
| 262 | ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCgug5VyMXQGnem2H1KVC4/HcRcD4zzBqSuJBRWVonSSoz3RoAZ7bWXCVVGwchtXwUURD689wFYdiPecOrWOUgeeyRq754YWRhU+W28vf8IZixgjCmiBhaL2gt3wff6pP+NXJpTSA4aeWE5DfNK5tZlxlSxqkKOS8JRSUeNQov5Tw== john.doe@example.com |
| 263 | # DELETED |
| 264 | # INVALID ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQDm5yP7FmEoqzQRDyskX+9+N0q9GrvZeh5RG52EUpE4ms/Ujm3ewV1LoGzc/lYKJAIbdcZQNJ9+06EfWZaIRA3oOwAPe1eCnX+aLr8E6Tw2gDMQOGc5e9HfyXpC2pDvzauoZNYqLALOG3y/1xjo7IH8GYRS2B7zO/Mf9DdCcCKSfw== john.doe@example.com |
| 265 | ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCaS7RHEcZ/zjl9hkWkqnm29RNr2OQ/TZ5jk2qBVMH3BgzPsTsEs+7ag9tfD8OCj+vOcwm626mQBZoR2e3niHa/9gnHBHFtOrGfzKbpRjTWtiOZbB9HF+rqMVD+Dawo/oicX/dDg7VAgOFSPothe6RMhbgWf84UcK5aQd5eP5y+tQ== john.doe@example.com |
| 266 | ---- |
| 267 | |
| 268 | When the SSH API is used, Gerrit needs an efficient way to lookup SSH |
| 269 | keys by username. Since the username can be easily resolved to an |
| 270 | account ID (via the account cache), accessing the SSH keys in the user |
| 271 | branch is fast. |
| 272 | |
| 273 | To identify SSH keys in the REST API Gerrit uses |
| 274 | link:rest-api-accounts.html#ssh-key-id[sequence numbers per account]. |
| 275 | This is why the order of the keys in the `authorized_keys` file is |
David Pursehouse | 0da65cf | 2019-05-27 16:25:27 +0900 | [diff] [blame] | 276 | used to determine the sequence numbers of the keys (the sequence |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 277 | numbers start at 1). |
| 278 | |
| 279 | To keep the sequence numbers intact when a key is deleted, a |
| 280 | '# DELETED' line is inserted at the position where the key was deleted. |
| 281 | |
| 282 | Invalid keys are marked with the prefix '# INVALID'. |
| 283 | |
| 284 | [[external-ids]] |
| 285 | == External IDs |
| 286 | |
David Pursehouse | f7f400c | 2019-05-27 20:22:08 +0900 | [diff] [blame] | 287 | External IDs are used to link identities, such as the username and email |
| 288 | addresses, and external identies such as an LDAP account or an OAUTH |
| 289 | identity, to an account in Gerrit. |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 290 | |
| 291 | External IDs are stored as Git Notes in the `All-Users` repository. The |
| 292 | name of the notes branch is `refs/meta/external-ids`. |
| 293 | |
David Pursehouse | 05c4cba | 2019-05-27 16:25:58 +0900 | [diff] [blame] | 294 | As note key the SHA1 of the external ID key is used, for example the key |
| 295 | for the external ID `username:jdoe` is `e0b751ae90ef039f320e097d7d212f490e933706`. |
| 296 | This ensures that an external ID is used only once (e.g. an external ID can |
| 297 | never be assigned to multiple accounts at a point in time). |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 298 | |
Edwin Kempin | 7ff264d | 2018-09-20 09:48:03 +0200 | [diff] [blame] | 299 | [IMPORTANT] |
| 300 | If the external ID key is changed manually you must adapt the note key |
David Pursehouse | 0da65cf | 2019-05-27 16:25:27 +0900 | [diff] [blame] | 301 | to the new SHA1, otherwise the external ID becomes inconsistent and is |
Edwin Kempin | 7ff264d | 2018-09-20 09:48:03 +0200 | [diff] [blame] | 302 | ignored by Gerrit. |
| 303 | |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 304 | The note content is a Git config file: |
| 305 | |
| 306 | ---- |
| 307 | [externalId "username:jdoe"] |
| 308 | accountId = 1003407 |
| 309 | email = jdoe@example.com |
| 310 | password = bcrypt:4:LCbmSBDivK/hhGVQMfkDpA==:XcWn0pKYSVU/UJgOvhidkEtmqCp6oKB7 |
| 311 | ---- |
| 312 | |
David Pursehouse | 0da65cf | 2019-05-27 16:25:27 +0900 | [diff] [blame] | 313 | The config file has one `externalId` section. The external ID key, which |
| 314 | consists of scheme and ID in the format '<scheme>:<id>', is used as |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 315 | subsection name. |
| 316 | |
David Pursehouse | 0da65cf | 2019-05-27 16:25:27 +0900 | [diff] [blame] | 317 | The `accountId` field is mandatory. The `email` and `password` fields |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 318 | are optional. |
| 319 | |
David Pursehouse | 0da65cf | 2019-05-27 16:25:27 +0900 | [diff] [blame] | 320 | The external IDs are maintained by Gerrit. This means users are not |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 321 | allowed to manually edit their external IDs. Only users with the |
Edwin Kempin | 47dd7ba | 2017-08-31 11:33:44 +0200 | [diff] [blame] | 322 | link:access-control.html#capability_accessDatabase[Access Database] |
| 323 | global capability can push updates to the `refs/meta/external-ids` |
| 324 | branch. However Gerrit rejects pushes if: |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 325 | |
| 326 | * any external ID config file cannot be parsed |
| 327 | * if a note key does not match the SHA of the external ID key in the |
| 328 | note content |
| 329 | * external IDs for non-existing accounts are contained |
| 330 | * invalid emails are contained |
| 331 | * any email is not unique (the same email is assigned to multiple |
| 332 | accounts) |
| 333 | * hashed passwords of external IDs with scheme `username` cannot be |
| 334 | decoded |
| 335 | |
| 336 | [[starred-changes]] |
| 337 | == Starred Changes |
| 338 | |
| 339 | link:dev-stars.html[Starred changes] allow users to mark changes as |
| 340 | favorites and receive email notifications for them. |
| 341 | |
| 342 | Each starred change is a tuple of an account ID, a change ID and a |
| 343 | label. |
| 344 | |
| 345 | To keep track of a change that is starred by an account, Gerrit creates |
| 346 | a `refs/starred-changes/YY/XXXX/ZZZZZZZ` ref in the `All-Users` |
| 347 | repository, where `YY/XXXX` is the sharded numeric change ID and |
| 348 | `ZZZZZZZ` is the account ID. |
| 349 | |
| 350 | A starred-changes ref points to a blob that contains the list of labels |
| 351 | that the account set on the change. The label list is stored as UTF-8 |
| 352 | text with one label per line. |
| 353 | |
| 354 | Since JGit has explicit optimizations for looking up refs by prefix |
| 355 | when the prefix ends with '/', this ref format is optimized to find |
| 356 | starred changes by change ID. Finding starred changes by change ID is |
| 357 | e.g. needed when a change is updated so that all users that have |
| 358 | the link:dev-stars.html#default-star[default star] on the change can be |
| 359 | notified by email. |
| 360 | |
| 361 | Gerrit also needs an efficient way to find all changes that were |
| 362 | starred by an account, e.g. to provide results for the |
| 363 | link:user-search.html#is-starred[is:starred] query operator. With the |
| 364 | ref format as described above the lookup of starred changes by account |
| 365 | ID is expensive, as this requires a scan of the full |
| 366 | `refs/starred-changes/*` namespace. To overcome this the users that |
| 367 | have starred a change are stored in the change index together with the |
| 368 | star labels. |
| 369 | |
| 370 | [[reviewed-flags]] |
| 371 | == Reviewed Flags |
| 372 | |
| 373 | When reviewing a patch set in the Gerrit UI, the reviewer can mark |
| 374 | files in the patch set as reviewed. These markers are called ‘Reviewed |
| 375 | Flags’ and are private to the user. A reviewed flag is a tuple of patch |
| 376 | set ID, file and account ID. |
| 377 | |
| 378 | Each user can have many thousands of reviewed flags and over time the |
| 379 | number can grow without bounds. |
| 380 | |
| 381 | The high amount of reviewed flags makes a storage in Git unsuitable |
| 382 | because each update requires opening the repository and committing a |
| 383 | change, which is a high overhead for flipping a bit. Therefore the |
| 384 | reviewed flags are stored in a database table. By default they are |
| 385 | stored in a local H2 database, but there is an extension point that |
| 386 | allows to plug in alternate implementations for storing the reviewed |
| 387 | flags. To replace the storage for reviewed flags a plugin needs to |
| 388 | implement the link:dev-plugins.html#account-patch-review-store[ |
| 389 | AccountPatchReviewStore] interface. E.g. to support a multi-master |
| 390 | setup where reviewed flags should be replicated between the master |
| 391 | nodes one could implement a store for the reviewed flags that is |
| 392 | based on MySQL with replication. |
| 393 | |
| 394 | [[account-sequence]] |
| 395 | == Account Sequence |
| 396 | |
| 397 | The next available account sequence number is stored as UTF-8 text in a |
| 398 | blob pointed to by the `refs/sequences/accounts` ref in the `All-Users` |
| 399 | repository. |
| 400 | |
| 401 | Multiple processes share the same sequence by incrementing the counter |
| 402 | using normal git ref updates. To amortize the cost of these ref |
| 403 | updates, processes increment the counter by a larger number and hand |
| 404 | out numbers from that range in memory until they run out. The size of |
| 405 | the account ID batch that each process retrieves at once is controlled |
| 406 | by the link:config-gerrit.html#notedb.accounts.sequenceBatchSize[ |
| 407 | notedb.accounts.sequenceBatchSize] parameter in the `gerrit.config` |
| 408 | file. |
| 409 | |
Edwin Kempin | d97ec6c | 2017-10-05 14:20:28 +0200 | [diff] [blame] | 410 | [[replication]] |
| 411 | == Replication |
| 412 | |
| 413 | To replicate account data the following branches from the `All-Users` |
| 414 | repository must be replicated: |
| 415 | |
| 416 | * `refs/users/*` (user branches) |
| 417 | * `refs/meta/external-ids` (external IDs) |
| 418 | * `refs/starred-changes/*` (star labels) |
| 419 | * `refs/sequences/accounts` (account sequence numbers, not needed for Gerrit |
| 420 | slaves) |
| 421 | |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 422 | GERRIT |
| 423 | ------ |
| 424 | Part of link:index.html[Gerrit Code Review] |
| 425 | |
| 426 | SEARCHBOX |
| 427 | --------- |