Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 1 | = Gerrit Code Review - Accounts |
| 2 | |
| 3 | == Overview |
| 4 | |
| 5 | Starting from 2.15 Gerrit accounts are fully stored in |
| 6 | link:dev-note-db.html[NoteDb]. |
| 7 | |
| 8 | The account data consists of a sequence number (account ID), account |
| 9 | properties (full name, preferred email, registration date, status, |
| 10 | inactive flag), preferences (general, diff and edit preferences), |
| 11 | project watches, SSH keys, external IDs, starred changes and reviewed |
| 12 | flags. |
| 13 | |
| 14 | Most account data is stored in a special link:#all-users[All-Users] |
| 15 | repository, which has one branch per user. Within the user branch there |
| 16 | are Git config files for the link:#account-properties[ |
| 17 | account properties], the link:#preferences[account preferences] and the |
| 18 | link:#project-watches[project watches]. In addition there is an |
| 19 | `authorized_keys` file for the link:#ssh-keys[SSH keys] that follows |
| 20 | the standard OpenSSH file format. |
| 21 | |
| 22 | The account data in the user branch is versioned and the Git history of |
| 23 | this branch serves as an audit log. |
| 24 | |
| 25 | The link:#external-ids[external IDs] are stored as Git Notes inside the |
| 26 | `All-Users` repository in the `refs/meta/external-ids` notes branch. |
| 27 | Storing all external IDs in a notes branch ensures that each external |
| 28 | ID is only used once. |
| 29 | |
| 30 | The link:#starred-changes[starred changes] are represented as |
| 31 | independent refs in the `All-Users` repository. They are not stored in |
| 32 | the user branch, since this data doesn't need versioning. |
| 33 | |
| 34 | The link:#reviewed-flags[reviewed flags] are not stored in Git, but are |
| 35 | persisted in a database table. This is because there is a high volume |
| 36 | of reviewed flags and storing them in Git would be inefficient. |
| 37 | |
| 38 | Since accessing the account data in Git is not fast enough for account |
| 39 | queries, e.g. when suggesting reviewers, Gerrit has a |
| 40 | link:#account-index[secondary index for accounts]. |
| 41 | |
| 42 | [[all-users]] |
| 43 | == `All-Users` repository |
| 44 | |
| 45 | The `All-Users` repository is a special repository that only contains |
| 46 | user-specific information. It contains one branch per user. The user |
| 47 | branch is formatted as `refs/users/CD/ABCD`, where `CD/ABCD` is the |
| 48 | link:access-control.html#sharded-user-id[sharded account ID], e.g. the |
| 49 | user branch for account `1000856` is `refs/users/56/1000856`. The |
| 50 | account IDs in the user refs are sharded so that there is a good |
| 51 | distribution of the Git data in the storage system. |
| 52 | |
| 53 | A user branch must exist for each account, as it represents the |
| 54 | account. The files in the user branch are all optional. This means |
| 55 | having a user branch with a tree that is completely empty is also a |
| 56 | valid account definition. |
| 57 | |
| 58 | Updates to the user branch are done through the |
| 59 | link:rest-api-accounts.html[Gerrit REST API], but users can also |
| 60 | manually fetch their user branch and push changes back to Gerrit. On |
| 61 | push the user data is evaluated and invalid user data is rejected. |
| 62 | |
| 63 | To hide the implementation detail of the sharded account ID in the ref |
| 64 | name Gerrit offers a magic `refs/users/self` ref that is automatically |
| 65 | resolved to the user branch of the calling user. The user can then use |
| 66 | this ref to fetch from and push to the own user branch. E.g. if user |
| 67 | `1000856` pushes to `refs/users/self`, the branch |
| 68 | `refs/users/56/1000856` is updated. In Gerrit `self` is an established |
| 69 | term to refer to the calling user (e.g. in change queries). This is why |
| 70 | the magic ref for the own user branch is called `refs/users/self`. |
| 71 | |
| 72 | A user branch should only be readable and writeable by the user to whom |
| 73 | the account belongs. To assign permissions on the user branches the |
| 74 | normal branch permission system is used. In the permission system the |
| 75 | user branches are specified as `refs/users/${shardeduserid}`. The |
| 76 | `${shardeduserid}` variable is resolved to the sharded account ID. This |
| 77 | variable is used to assign default access rights on all user branches |
| 78 | that apply only to the owning user. The following permissions are set |
| 79 | by default when a Gerrit site is newly installed or upgraded to a |
| 80 | version which supports user branches: |
| 81 | |
| 82 | .All-Users project.config |
| 83 | ---- |
| 84 | [access "refs/users/${shardeduserid}"] |
| 85 | exclusiveGroupPermissions = read push submit |
| 86 | read = group Registered Users |
| 87 | push = group Registered Users |
| 88 | label-Code-Review = -2..+2 group Registered Users |
| 89 | submit = group Registered Users |
| 90 | ---- |
| 91 | |
| 92 | The user branch contains several files with account data which are |
| 93 | described link:#account-data-in-user-branch[below]. |
| 94 | |
| 95 | In addition to the user branches the `All-Users` repository also |
| 96 | contains a branch for the link:#external-ids[external IDs] and special |
| 97 | refs for the link:#starred-changes[starred changes]. |
| 98 | |
| 99 | Also the next available value of the link:#account-sequence[account |
| 100 | sequence] is stored in the `All-Users` repository. |
| 101 | |
| 102 | [[account-index]] |
| 103 | == Account Index |
| 104 | |
| 105 | There are several situations in which Gerrit needs to query accounts, |
| 106 | e.g.: |
| 107 | |
| 108 | * For sending email notifications to project watchers. |
| 109 | * For reviewer suggestions. |
| 110 | |
| 111 | Accessing the account data in Git is not fast enough for account |
| 112 | queries, since it requires accessing all user branches and parsing |
| 113 | all files in each of them. To overcome this Gerrit has a secondary |
| 114 | index for accounts. The account index is either based on |
| 115 | link:config-gerrit.html#index.type[Lucene or Elasticsearch]. |
| 116 | |
| 117 | Via the link:rest-api-accounts.html#query-account[Query Account] REST |
| 118 | endpoint link:user-search-accounts.html[generic account queries] are |
| 119 | supported. |
| 120 | |
| 121 | Accounts are automatically reindexed on any update. The |
| 122 | link:rest-api-accounts.html#index-account[Index Account] REST endpoint |
| 123 | allows to reindex an account manually. In addition the |
| 124 | link:pgm-reindex.html[reindex] program can be used to reindex all |
| 125 | accounts offline. |
| 126 | |
| 127 | [[account-data-in-user-branch]] |
| 128 | == Account Data in User Branch |
| 129 | |
| 130 | A user branch contains several Git config files with the account data: |
| 131 | |
| 132 | * `account.config`: |
| 133 | + |
| 134 | Stores the link:#account-properties[account properties]. |
| 135 | |
| 136 | * `preferences.config`: |
| 137 | + |
| 138 | Stores the link:#preferences[user preferences] of the account. |
| 139 | |
| 140 | * `watch.config`: |
| 141 | + |
| 142 | Stores the link:#project-watches[project watches] of the account. |
| 143 | |
| 144 | In addition it contains an |
| 145 | link:https://en.wikibooks.org/wiki/OpenSSH/Client_Configuration_Files#.7E.2F.ssh.2Fauthorized_keys[ |
| 146 | authorized_keys] file with the link:#ssh-keys[SSH keys] of the account. |
| 147 | |
| 148 | [[account-properties]] |
| 149 | === Account Properties |
| 150 | |
| 151 | The account properties are stored in the user branch in the |
| 152 | `account.config` file: |
| 153 | |
| 154 | ---- |
| 155 | [account] |
| 156 | fullName = John Doe |
| 157 | preferredEmail = john.doe@example.com |
| 158 | status = OOO |
| 159 | active = false |
| 160 | ---- |
| 161 | |
| 162 | For active accounts the `active` parameter can be omitted. |
| 163 | |
| 164 | The registration date is not contained in the `account.config` file but |
| 165 | is derived from the timestamp of the first commit on the user branch. |
| 166 | |
| 167 | When users update their account properties by pushing to the user |
| 168 | branch, it is verified that the preferred email exists in the external |
| 169 | IDs. |
| 170 | |
| 171 | Users are not allowed to flip the active value themselves; only |
| 172 | administrators and users with the |
| 173 | link:access-control.html#capability_modifyAccount[Modify Account] |
| 174 | global capability are allowed to change it. |
| 175 | |
| 176 | Since all data in the `account.config` file is optional the |
| 177 | `account.config` file may be absent from some user branches. |
| 178 | |
| 179 | [[preferences]] |
| 180 | === Preferences |
| 181 | |
| 182 | The account properties are stored in the user branch in the |
| 183 | `preferences.config` file. There are separate sections for |
| 184 | link:intro-user.html#preferences[general], |
| 185 | link:user-review-ui.html#diff-preferences[diff] and edit preferences: |
| 186 | |
| 187 | ---- |
| 188 | [general] |
| 189 | showSiteHeader = false |
| 190 | [diff] |
| 191 | hideTopMenu = true |
| 192 | [edit] |
| 193 | lineLength = 80 |
| 194 | ---- |
| 195 | |
| 196 | The parameter names match the names that are used in the preferences REST API: |
| 197 | |
| 198 | * link:rest-api-accounts.html#preferences-info[General Preferences] |
| 199 | * link:rest-api-accounts.html#diff-preferences-info[Diff Preferences] |
| 200 | * link:rest-api-accounts.html#edit-preferences-info[Edit Preferences] |
| 201 | |
| 202 | If the value for a preference is the same as the default value for this |
| 203 | preference, it can be omitted in the `preference.config` file. |
| 204 | |
| 205 | Defaults for general and diff preferences that apply for all accounts |
| 206 | can be configured in the `refs/users/default` branch in the `All-Users` |
| 207 | repository. |
| 208 | |
| 209 | [[project-watches]] |
| 210 | === Project Watches |
| 211 | |
| 212 | Users can configure watches on projects to receive email notifications |
| 213 | for changes of that project. |
| 214 | |
| 215 | A watch configuration consists of the project name and an optional |
| 216 | filter query. If a filter query is specified, email notifications will |
| 217 | be sent only for changes of that project that match this query. |
| 218 | |
| 219 | In addition, each watch configuration can contain a list of |
| 220 | notification types that determine for which events email notifications |
| 221 | should be sent. E.g. a user can configure that email notifications |
| 222 | should only be sent if a new patch set is uploaded and when the change |
| 223 | gets submitted, but not on other events. |
| 224 | |
| 225 | Project watches are stored in a `watch.config` file in the user branch: |
| 226 | |
| 227 | ---- |
| 228 | [project "foo"] |
| 229 | notify = * [ALL_COMMENTS] |
| 230 | notify = branch:master [ALL_COMMENTS, NEW_PATCHSETS] |
| 231 | notify = branch:master owner:self [SUBMITTED_CHANGES] |
| 232 | ---- |
| 233 | |
| 234 | The `watch.config` file has one project section for all project watches |
| 235 | of a project. The project name is used as subsection name and the |
| 236 | filters with the notification types, that decide for which events email |
| 237 | notifications should be sent, are represented as `notify` values in the |
| 238 | subsection. A `notify` value is formatted as |
| 239 | "<filter> [<comma-separated-list-of-notification-types>]". The |
| 240 | supported notification types are described in the |
| 241 | link:user-notify.html#notify.name.type[Email Notifications documentation]. |
| 242 | |
| 243 | For a change event, a notification will be sent if any `notify` value |
| 244 | of the corresponding project has both a filter that matches the change |
| 245 | and a notification type that matches the event. |
| 246 | |
| 247 | In order to send email notifications on change events, Gerrit needs to |
| 248 | find all accounts that watch the corresponding project. To make this |
| 249 | lookup fast the secondary account index is used. The account index |
| 250 | contains a repeated field that stores the projects that are being |
| 251 | watched by an account. After the accounts that watch the project have |
| 252 | been retrieved from the index, the complete watch configuration is |
| 253 | available from the account cache and Gerrit can check if any watch |
| 254 | matches the change and the event. |
| 255 | |
| 256 | [[ssh-keys]] |
| 257 | === SSH Keys |
| 258 | |
| 259 | SSH keys are stored in the user branch in an `authorized_keys` file, |
| 260 | which is the |
| 261 | link:https://en.wikibooks.org/wiki/OpenSSH/Client_Configuration_Files#.7E.2F.ssh.2Fauthorized_keys[ |
| 262 | standard OpenSSH file format] for storing SSH keys: |
| 263 | |
| 264 | ---- |
| 265 | ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCgug5VyMXQGnem2H1KVC4/HcRcD4zzBqSuJBRWVonSSoz3RoAZ7bWXCVVGwchtXwUURD689wFYdiPecOrWOUgeeyRq754YWRhU+W28vf8IZixgjCmiBhaL2gt3wff6pP+NXJpTSA4aeWE5DfNK5tZlxlSxqkKOS8JRSUeNQov5Tw== john.doe@example.com |
| 266 | # DELETED |
| 267 | # INVALID ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQDm5yP7FmEoqzQRDyskX+9+N0q9GrvZeh5RG52EUpE4ms/Ujm3ewV1LoGzc/lYKJAIbdcZQNJ9+06EfWZaIRA3oOwAPe1eCnX+aLr8E6Tw2gDMQOGc5e9HfyXpC2pDvzauoZNYqLALOG3y/1xjo7IH8GYRS2B7zO/Mf9DdCcCKSfw== john.doe@example.com |
| 268 | ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCaS7RHEcZ/zjl9hkWkqnm29RNr2OQ/TZ5jk2qBVMH3BgzPsTsEs+7ag9tfD8OCj+vOcwm626mQBZoR2e3niHa/9gnHBHFtOrGfzKbpRjTWtiOZbB9HF+rqMVD+Dawo/oicX/dDg7VAgOFSPothe6RMhbgWf84UcK5aQd5eP5y+tQ== john.doe@example.com |
| 269 | ---- |
| 270 | |
| 271 | When the SSH API is used, Gerrit needs an efficient way to lookup SSH |
| 272 | keys by username. Since the username can be easily resolved to an |
| 273 | account ID (via the account cache), accessing the SSH keys in the user |
| 274 | branch is fast. |
| 275 | |
| 276 | To identify SSH keys in the REST API Gerrit uses |
| 277 | link:rest-api-accounts.html#ssh-key-id[sequence numbers per account]. |
| 278 | This is why the order of the keys in the `authorized_keys` file is |
| 279 | used to determines the sequence numbers of the keys (the sequence |
| 280 | numbers start at 1). |
| 281 | |
| 282 | To keep the sequence numbers intact when a key is deleted, a |
| 283 | '# DELETED' line is inserted at the position where the key was deleted. |
| 284 | |
| 285 | Invalid keys are marked with the prefix '# INVALID'. |
| 286 | |
| 287 | [[external-ids]] |
| 288 | == External IDs |
| 289 | |
| 290 | External IDs are used to link external identities, such as an LDAP |
| 291 | account or an OAUTH identity, to an account in Gerrit. |
| 292 | |
| 293 | External IDs are stored as Git Notes in the `All-Users` repository. The |
| 294 | name of the notes branch is `refs/meta/external-ids`. |
| 295 | |
| 296 | As note key the SHA1 of the external ID key is used. This ensures that |
| 297 | an external ID is used only once (e.g. an external ID can never be |
| 298 | assigned to multiple accounts at a point in time). |
| 299 | |
| 300 | The note content is a Git config file: |
| 301 | |
| 302 | ---- |
| 303 | [externalId "username:jdoe"] |
| 304 | accountId = 1003407 |
| 305 | email = jdoe@example.com |
| 306 | password = bcrypt:4:LCbmSBDivK/hhGVQMfkDpA==:XcWn0pKYSVU/UJgOvhidkEtmqCp6oKB7 |
| 307 | ---- |
| 308 | |
| 309 | The config file has one `externalId` section. The external ID key which |
| 310 | consists of scheme and ID in the format '<scheme>:<id>' is used as |
| 311 | subsection name. |
| 312 | |
| 313 | The `accountId` field is mandatory, the `email` and `password` fields |
| 314 | are optional. |
| 315 | |
| 316 | The external IDs are maintained by Gerrit, this means users are not |
| 317 | allowed to manually edit their external IDs. Only users with the |
Edwin Kempin | 47dd7ba | 2017-08-31 11:33:44 +0200 | [diff] [blame] | 318 | link:access-control.html#capability_accessDatabase[Access Database] |
| 319 | global capability can push updates to the `refs/meta/external-ids` |
| 320 | branch. However Gerrit rejects pushes if: |
Edwin Kempin | 311d570 | 2017-07-28 15:10:24 +0200 | [diff] [blame] | 321 | |
| 322 | * any external ID config file cannot be parsed |
| 323 | * if a note key does not match the SHA of the external ID key in the |
| 324 | note content |
| 325 | * external IDs for non-existing accounts are contained |
| 326 | * invalid emails are contained |
| 327 | * any email is not unique (the same email is assigned to multiple |
| 328 | accounts) |
| 329 | * hashed passwords of external IDs with scheme `username` cannot be |
| 330 | decoded |
| 331 | |
| 332 | [[starred-changes]] |
| 333 | == Starred Changes |
| 334 | |
| 335 | link:dev-stars.html[Starred changes] allow users to mark changes as |
| 336 | favorites and receive email notifications for them. |
| 337 | |
| 338 | Each starred change is a tuple of an account ID, a change ID and a |
| 339 | label. |
| 340 | |
| 341 | To keep track of a change that is starred by an account, Gerrit creates |
| 342 | a `refs/starred-changes/YY/XXXX/ZZZZZZZ` ref in the `All-Users` |
| 343 | repository, where `YY/XXXX` is the sharded numeric change ID and |
| 344 | `ZZZZZZZ` is the account ID. |
| 345 | |
| 346 | A starred-changes ref points to a blob that contains the list of labels |
| 347 | that the account set on the change. The label list is stored as UTF-8 |
| 348 | text with one label per line. |
| 349 | |
| 350 | Since JGit has explicit optimizations for looking up refs by prefix |
| 351 | when the prefix ends with '/', this ref format is optimized to find |
| 352 | starred changes by change ID. Finding starred changes by change ID is |
| 353 | e.g. needed when a change is updated so that all users that have |
| 354 | the link:dev-stars.html#default-star[default star] on the change can be |
| 355 | notified by email. |
| 356 | |
| 357 | Gerrit also needs an efficient way to find all changes that were |
| 358 | starred by an account, e.g. to provide results for the |
| 359 | link:user-search.html#is-starred[is:starred] query operator. With the |
| 360 | ref format as described above the lookup of starred changes by account |
| 361 | ID is expensive, as this requires a scan of the full |
| 362 | `refs/starred-changes/*` namespace. To overcome this the users that |
| 363 | have starred a change are stored in the change index together with the |
| 364 | star labels. |
| 365 | |
| 366 | [[reviewed-flags]] |
| 367 | == Reviewed Flags |
| 368 | |
| 369 | When reviewing a patch set in the Gerrit UI, the reviewer can mark |
| 370 | files in the patch set as reviewed. These markers are called ‘Reviewed |
| 371 | Flags’ and are private to the user. A reviewed flag is a tuple of patch |
| 372 | set ID, file and account ID. |
| 373 | |
| 374 | Each user can have many thousands of reviewed flags and over time the |
| 375 | number can grow without bounds. |
| 376 | |
| 377 | The high amount of reviewed flags makes a storage in Git unsuitable |
| 378 | because each update requires opening the repository and committing a |
| 379 | change, which is a high overhead for flipping a bit. Therefore the |
| 380 | reviewed flags are stored in a database table. By default they are |
| 381 | stored in a local H2 database, but there is an extension point that |
| 382 | allows to plug in alternate implementations for storing the reviewed |
| 383 | flags. To replace the storage for reviewed flags a plugin needs to |
| 384 | implement the link:dev-plugins.html#account-patch-review-store[ |
| 385 | AccountPatchReviewStore] interface. E.g. to support a multi-master |
| 386 | setup where reviewed flags should be replicated between the master |
| 387 | nodes one could implement a store for the reviewed flags that is |
| 388 | based on MySQL with replication. |
| 389 | |
| 390 | [[account-sequence]] |
| 391 | == Account Sequence |
| 392 | |
| 393 | The next available account sequence number is stored as UTF-8 text in a |
| 394 | blob pointed to by the `refs/sequences/accounts` ref in the `All-Users` |
| 395 | repository. |
| 396 | |
| 397 | Multiple processes share the same sequence by incrementing the counter |
| 398 | using normal git ref updates. To amortize the cost of these ref |
| 399 | updates, processes increment the counter by a larger number and hand |
| 400 | out numbers from that range in memory until they run out. The size of |
| 401 | the account ID batch that each process retrieves at once is controlled |
| 402 | by the link:config-gerrit.html#notedb.accounts.sequenceBatchSize[ |
| 403 | notedb.accounts.sequenceBatchSize] parameter in the `gerrit.config` |
| 404 | file. |
| 405 | |
| 406 | GERRIT |
| 407 | ------ |
| 408 | Part of link:index.html[Gerrit Code Review] |
| 409 | |
| 410 | SEARCHBOX |
| 411 | --------- |