Marian Harbach | ebeb154 | 2019-12-13 10:42:46 +0100 | [diff] [blame] | 1 | :linkattrs: |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 2 | = Gerrit Code Review - System Design |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 3 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 4 | == Objective |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 5 | |
| 6 | Gerrit is a web based code review system, facilitating online code |
| 7 | reviews for projects using the Git version control system. |
| 8 | |
| 9 | Gerrit makes reviews easier by showing changes in a side-by-side |
Bruce Zu | 6b0fd76 | 2012-10-25 16:52:00 +0800 | [diff] [blame] | 10 | display, and allowing inline/file comments to be added by any reviewer. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 11 | |
| 12 | Gerrit simplifies Git based project maintainership by permitting |
| 13 | any authorized user to submit changes to the master Git repository, |
| 14 | rather than requiring all approved changes to be merged in by |
| 15 | hand by the project maintainer. This functionality enables a more |
| 16 | centralized usage of Git. |
| 17 | |
| 18 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 19 | == Background |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 20 | |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 21 | Git is a distributed version control system, wherein each repository |
| 22 | is assumed to be owned/maintained by a single user. There are no |
David Pursehouse | 221d4f6 | 2012-06-08 17:38:08 +0900 | [diff] [blame] | 23 | inherent security controls built into Git, so the ability to read |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 24 | from or write to a repository is controlled entirely by the host's |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 25 | filesystem or network access controls. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 26 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 27 | The objective of Gerrit is to facilitate Git development by larger |
| 28 | teams: it provides a means to enforce organizational policies around |
| 29 | code submissions, eg. "all code must be reviewed by another |
| 30 | developer", "all code shall pass tests". It achieves this by |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 31 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 32 | * providing fine-grained (per-branch, per-repository, inheriting) |
| 33 | access controls, which allow a Gerrit admin to delegate permissions |
| 34 | to different team(-lead)s. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 35 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 36 | * facilitate code review: Gerrit offers a web view of pending code |
| 37 | changes, that allows for easy reading and commenting by humans. The |
| 38 | web view can offer data coming out of automated QA processes (eg. |
| 39 | CI). The permission system also includes fine grained control of who |
| 40 | can approve pending changes for submission to further facilitate |
| 41 | delegation of code ownership. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 42 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 43 | == Overview |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 44 | |
| 45 | Developers create one or more changes on their local desktop system, |
| 46 | then upload them for review to Gerrit using the standard `git push` |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 47 | command line program, or any GUI which can invoke `git push` on behalf |
| 48 | of the user. Authentication and data transfer are handled through SSH |
| 49 | and HTTPS. Uploads are protected by the authentication, |
| 50 | confidentiality and integrity offered by the transport (SSH, HTTPS). |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 51 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 52 | Each Git commit created on the client desktop system is converted into |
| 53 | a unique change record which can be reviewed independently. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 54 | |
| 55 | A summary of each newly uploaded change is automatically emailed |
| 56 | to reviewers, so they receive a direct hyperlink to review the |
| 57 | change on the web. Reviewer email addresses can be specified on the |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 58 | `git push` command line, but typically reviewers are added in the web |
| 59 | interface. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 60 | |
| 61 | Reviewers use the web interface to read the side-by-side or unified |
Bruce Zu | 6b0fd76 | 2012-10-25 16:52:00 +0800 | [diff] [blame] | 62 | diff of a change, and insert draft inline/file comments where |
| 63 | appropriate. A draft comment is visible only to the reviewer, until |
| 64 | they publish those comments. Published comments are automatically |
| 65 | emailed to the change author by Gerrit, and are CC'd to all other |
| 66 | reviewers who have already commented on the change. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 67 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 68 | Reviewers can score the change ("vote"), indicating whether they feel the |
| 69 | change is ready for inclusion in the project, needs more work, or |
| 70 | should be rejected outright. These scores provide direct feedback to |
| 71 | Gerrit's change submit function. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 72 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 73 | After a change has been scored positively by reviewers, Gerrit enables |
| 74 | a submit button on the web interface. Authorized users can push the |
| 75 | submit button to have the change enter the project repository. The |
| 76 | user pressing the submit button does not need to be the author of the |
| 77 | change. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 78 | |
| 79 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 80 | == Infrastructure |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 81 | |
| 82 | End-user web browsers make HTTP requests directly to Gerrit's |
Ben Rohlfs | da0a62b | 2021-04-26 17:02:19 +0200 | [diff] [blame] | 83 | HTTP server. As nearly all of the Gerrit user interface is implemented |
| 84 | in a JavaScript based web app, the majority of these requests are |
| 85 | transmitting compressed JSON payloads, with all HTML being generated |
| 86 | within the browser. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 87 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 88 | Gerrit's HTTP server side component is implemented as a standard Java |
| 89 | servlet, and thus runs within any link:install-j2ee.html[J2EE servlet |
| 90 | container]. The standard install will run inside Jetty, which is |
| 91 | included in the binary. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 92 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 93 | End-user uploads are performed over SSH or HTTP, so Gerrit's servlets |
| 94 | also start up a background thread to receive SSH connections through |
| 95 | an independent SSH port. SSH clients communicate directly with this |
| 96 | port, bypassing the HTTP server used by browsers. |
| 97 | |
| 98 | User authentication is handled by identity realms. Gerrit supports the |
| 99 | following types of authentication: |
| 100 | |
| 101 | * OpenId (see link:http://openid.net/developers/specs/[OpenID Specifications,role=external,window=_blank]) |
| 102 | * OAuth2 |
| 103 | * LDAP |
| 104 | * Google accounts (on googlesource.com) |
| 105 | * SAML |
| 106 | * Kerberos |
| 107 | * 3rd party SSO |
| 108 | |
| 109 | === NoteDb |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 110 | |
| 111 | Server side data storage for Gerrit is broken down into two different |
| 112 | categories: |
| 113 | |
| 114 | * Git repository data |
| 115 | * Gerrit metadata |
| 116 | |
| 117 | The Git repository data is the Git object database used to store |
| 118 | already submitted revisions, as well as all uploaded (proposed) |
| 119 | changes. Gerrit uses the standard Git repository format, and |
| 120 | therefore requires direct filesystem access to the repositories. |
| 121 | All repository data is stored in the filesystem and accessed through |
| 122 | the JGit library. Repository data can be stored on remote servers |
| 123 | accessible through NFS or SMB, but the remote directory must |
| 124 | be mounted on the Gerrit server as part of the local filesystem |
| 125 | namespace. Remote filesystems are likely to perform worse than |
| 126 | local ones, due to Git disk IO behavior not being optimized for |
| 127 | remote access. |
| 128 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 129 | The Gerrit metadata contains a summary of the available changes, all |
| 130 | comments (published and drafts), and individual user account |
| 131 | information. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 132 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 133 | Gerrit metadata is also stored in Git, with the commits marking the |
| 134 | historical state of metadata. Data is stored in the trees associated |
| 135 | with the commits, typically using Git config file or JSON as the base |
| 136 | format. For metadata, there are 3 types of data: changes, accounts and |
| 137 | groups. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 138 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 139 | Accounts are stored in a special Git repository `All-Users`. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 140 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 141 | Accounts can be grouped in groups. Gerrit has a built-in group system, |
| 142 | but can also interface to external group system (eg. Google groups, |
| 143 | LDAP). The built-in groups are stored in `All-Users`. |
Martin Fick | b026ca3 | 2011-07-27 18:23:20 -0600 | [diff] [blame] | 144 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 145 | Draft comments are stored in `All-Users` too. |
| 146 | |
| 147 | Permissions are stored in Git, in a branch `refs/meta/config` for the |
| 148 | repository. Repository configuration (including permissions) supports |
| 149 | single inheritance, with the `All-Projects` repository containing |
| 150 | site-wide defaults. |
| 151 | |
| 152 | Code review metadata is stored in Git, alongside the code under |
| 153 | review. Metadata includes change status, votes, comments. This review |
| 154 | metadata is stored in NoteDb along with the submitted code and code |
| 155 | under review. Hence, the review history can be exported with `git |
| 156 | clone --mirror` by anyone with sufficient permissions. |
| 157 | |
| 158 | == Permissions |
| 159 | |
| 160 | Permissions are specified on branch names, and given to groups. For |
| 161 | example, |
| 162 | |
| 163 | ``` |
| 164 | [access "refs/heads/stable/*"] |
| 165 | push = group Release-Engineers |
| 166 | ``` |
| 167 | |
| 168 | this provides a rule, granting Release-Engineers push permission for |
| 169 | stable branches. |
| 170 | |
| 171 | There are fundamentally two types of permissions: |
| 172 | |
| 173 | * Write permissions (who can vote, push, submit etc.) |
| 174 | |
| 175 | * Read permissions (who can see data) |
| 176 | |
| 177 | Read permissions need special treatment across Gerrit, because Gerrit |
| 178 | should only surface data (including repository existence) if a user |
| 179 | has read permission. This means that |
| 180 | |
| 181 | * The git wire protocol support must omit references from |
| 182 | advertisement if the user lacks read permissions |
| 183 | |
| 184 | * Uploads through the git wire protocol must refuse commits that are |
Han-Wen Nienhuys | 37a1cab | 2021-04-01 12:46:00 +0200 | [diff] [blame] | 185 | based on SHA-1s for data that the user can't see. |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 186 | |
| 187 | * Tags are only visible if their commits are visible to user through a |
| 188 | non-tag reference. |
| 189 | |
| 190 | Metadata (eg. OAuth credentials) is also stored in Git. Existing |
| 191 | endpoints must refuse creating branches or changes that expose these |
| 192 | metadata or allow changes to them. |
| 193 | |
| 194 | |
| 195 | === Indexing |
| 196 | |
| 197 | Almost all data is stored as Git, but Git only supports fast lookup by |
Han-Wen Nienhuys | 37a1cab | 2021-04-01 12:46:00 +0200 | [diff] [blame] | 198 | SHA-1 or by ref (branch) name. Therefore Gerrit also has an indexing |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 199 | system (powered by Lucene by default) for other types of queries. |
| 200 | There are 4 indices: |
| 201 | |
| 202 | * Project index - find repositories by name, parent project, etc. |
| 203 | * Account index - find accounts by name, email, etc. |
| 204 | * Group index - find groups by name, owner, description etc. |
| 205 | * Change index - find changes by file, status, modification date etc. |
| 206 | |
Han-Wen Nienhuys | 37a1cab | 2021-04-01 12:46:00 +0200 | [diff] [blame] | 207 | The base entities are characterized by SHA-1s. Storing the |
| 208 | characterizing SHA-1s allows detection of stale index entries. |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 209 | |
| 210 | == Plug-in architecture |
| 211 | |
| 212 | Gerrit has a plug-in architecture. Plugins can be installed by |
| 213 | dropping them into $site_directory/plugins, or at runtime through |
| 214 | plugin SSH commands, or the plugin REST API. |
| 215 | |
| 216 | === Backend plugins |
| 217 | |
| 218 | At runtime, code can be loaded from a `.jar` file. This code can hook |
| 219 | into predefined extension points. A common use of plugins is to have |
| 220 | Gerrit interoperate with site-specific tools, such as CI-systems or |
| 221 | issue trackers. |
| 222 | |
| 223 | // list some notable extension points, and notable plugins |
| 224 | // link to plugin development |
| 225 | |
| 226 | Some backend plugins expose the JVM for scripting use (eg. Groovy, |
| 227 | Scala), so plugins can be written without having to setup a Java |
| 228 | development environment. |
| 229 | |
| 230 | // Luca to expand: how do script plugins load their scripts? |
| 231 | |
| 232 | === Frontend plugins |
| 233 | |
| 234 | The UI can be extended using Frontend plugins. This is useful for |
| 235 | changing the look & feel of Gerrit, but it can also be used to surface |
| 236 | data from systems that aren't integrated with the Gerrit backend, eg. |
| 237 | CI systems or code coverage providers. |
| 238 | |
| 239 | // FE team to write a bit more: |
| 240 | // * how to load ? |
| 241 | // * XSRF, CORS ? |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 242 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 243 | == Internationalization and Localization |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 244 | |
| 245 | As a source code review system for open source projects, where the |
| 246 | commonly preferred language for communication is typically English, |
| 247 | Gerrit does not make internationalization or localization a priority. |
| 248 | |
| 249 | The majority of Gerrit's users will be writing change descriptions |
| 250 | and comments in English, and therefore an English user interface |
| 251 | is usable by the target user base. |
| 252 | |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 253 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 254 | == Accessibility Considerations |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 255 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 256 | // UI team to rewrite this. |
| 257 | |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 258 | Whenever possible Gerrit displays raw text rather than image icons, |
| 259 | so screen readers should still be able to provide useful information |
| 260 | to blind persons accessing Gerrit sites. |
| 261 | |
| 262 | Standard HTML hyperlinks are used rather than HTML div or span tags |
| 263 | with click listeners. This provides two benefits to the end-user. |
| 264 | The first benefit is that screen readers are optimized to locating |
| 265 | standard hyperlink anchors and presenting them to the end-user as |
| 266 | a navigation action. The second benefit is that users can use |
| 267 | the 'open in new tab/window' feature of their browser whenever |
| 268 | they choose. |
| 269 | |
| 270 | When possible, Gerrit uses the ARIA properties on DOM widgets to |
| 271 | provide hints to screen readers. |
| 272 | |
| 273 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 274 | == Browser Compatibility |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 275 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 276 | Gerrit requires a JavaScript enabled browser. |
| 277 | |
| 278 | // UI team to add section on minimum browser requirements. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 279 | |
David Ostrovsky | 7163dac | 2017-07-29 06:49:38 +0200 | [diff] [blame] | 280 | As Gerrit is a pure JavaScript application on the client side, with |
| 281 | no server side rendering fallbacks, the browser must support modern |
| 282 | JavaScript semantics in order to access the Gerrit web application. |
| 283 | Dumb clients such as `lynx`, `wget`, `curl`, or even many search engine |
| 284 | spiders are not able to access Gerrit content. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 285 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 286 | All of the content stored within Gerrit is also available through |
| 287 | other means, such as gitweb or the `git://` protocol. Any existing |
| 288 | search engine crawlers can index the server-side HTML served by a code |
| 289 | browser, and thus can index the majority of the changes which might |
| 290 | appear in Gerrit. Therefore the lack of support for most search engine |
| 291 | crawlers is a non-issue for most Gerrit deployments. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 292 | |
| 293 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 294 | == Product Integration |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 295 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 296 | Gerrit optionally surfaces links to HTML pages in a code browser. The |
| 297 | links are configurable, and Gerrit comes with a built-in code browser, |
| 298 | called Gitiles. |
Shawn O. Pearce | 142385d | 2009-03-01 11:09:05 -0800 | [diff] [blame] | 299 | |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 300 | Gerrit integrates with some types of corporate single-sign-on (SSO) |
| 301 | solutions, typically by having the SSO authentication be performed |
| 302 | in a reverse proxy web server and then blindly trusting that all |
| 303 | incoming connections have been authenticated by that reverse proxy. |
| 304 | When configured to use this form of authentication, Gerrit does |
| 305 | not integrate with OpenID providers. |
| 306 | |
| 307 | When installing Gerrit, administrators may optionally include an |
| 308 | HTML header or footer snippet which may include user tracking code, |
| 309 | such as that used by Google Analytics. This is a per-instance |
| 310 | configuration that must be done by hand, and is not supported |
| 311 | out of the box. Other site trackers instead of Google Analytics |
| 312 | can be used, as the administrator can supply any HTML/JavaScript |
| 313 | they choose. |
| 314 | |
| 315 | Gerrit does not integrate with any Google service, or any other |
| 316 | services other than those listed above. |
| 317 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 318 | Plugins (see above) can be used to drive product integrations from the |
| 319 | Gerrit side. Products that support Gerrit explicitly can use the REST |
| 320 | API or the SSH API to contact Gerrit. |
| 321 | |
| 322 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 323 | == Privacy Considerations |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 324 | |
| 325 | Gerrit stores the following information per user account: |
| 326 | |
| 327 | * Full Name |
| 328 | * Preferred Email Address |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 329 | |
| 330 | The full name and preferred email address fields are shown to any |
| 331 | site visitor viewing a page containing a change uploaded by the |
| 332 | account owner, or containing a published comment written by the |
| 333 | account owner. |
| 334 | |
| 335 | Showing the full name and preferred email is approximately the same |
| 336 | risk as the `From` header of an email posted to a public mailing |
| 337 | list that maintains archives, and Gerrit treats these fields in |
| 338 | much the same way that a mailing list archive might handle them. |
| 339 | Users who don't want to expose this information should either not |
| 340 | participate in a Gerrit based online community, or open a new email |
| 341 | address dedicated for this use. |
| 342 | |
| 343 | As the Gerrit UI data is only available through XSRF protected |
| 344 | JSON-RPC calls, "screen-scraping" for email addresses is difficult, |
| 345 | but not impossible. It is unlikely a spammer will go through the |
| 346 | effort required to code a custom scraping application necessary |
| 347 | to cull email addresses from published Gerrit comments. In most |
| 348 | cases these same addresses would be more easily obtained from the |
| 349 | project's mailing list archives. |
| 350 | |
Shawn O. Pearce | aa8b3d4 | 2009-03-01 11:10:55 -0800 | [diff] [blame] | 351 | The user's name and email address is stored unencrypted in the |
Edwin Kempin | 4372f73 | 2018-12-11 10:35:23 +0100 | [diff] [blame] | 352 | link:config-accounts.html#all-users[All-Users] repository. |
Shawn O. Pearce | aa8b3d4 | 2009-03-01 11:10:55 -0800 | [diff] [blame] | 353 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 354 | == Spam and Abuse Considerations |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 355 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 356 | There is no spam protection for the Git protocol upload path. |
| 357 | Uploading a change successfully requires a pre-existing account, and a |
| 358 | lot of up-front effort. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 359 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 360 | Gerrit makes no attempt to detect spam changes or comments in the web |
| 361 | UI. To post and publish a comment a client must sign in and then use |
| 362 | the XSRF protected JSON-RPC interface to publish the draft on an |
| 363 | existing change record. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 364 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 365 | Absence of SPAM handling is based upon the idea that Gerrit caters to |
| 366 | a niche audience, and will therefore be unattractive to spammers. In |
| 367 | addition, it is not a factor for corporate, on-premise deployments. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 368 | |
| 369 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 370 | == Scalability |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 371 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 372 | Gerrit supports the Git wire protocol, and an API (one API for HTTP, |
| 373 | and one for SSH). |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 374 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 375 | The git wire protocol does a client/server negotiation to avoid |
| 376 | sending too much data. This negotation occupies a CPU, so the number |
| 377 | of concurrent push/fetch operations should be capped by the number of |
| 378 | CPUs. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 379 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 380 | Clients on slow network connections may be network bound rather than |
| 381 | server side CPU bound, in which case a core may be effectively shared |
| 382 | with another user. Possible core sharing due to network bottlenecks |
Shawn O. Pearce | 0825581 | 2011-04-12 00:02:38 -0400 | [diff] [blame] | 383 | generally holds true for network connections running below 10 MiB/sec. |
| 384 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 385 | Deployments for large, distributed companies can replicate Git data to |
| 386 | read-only replicas to offload fetch traffic. The read-only replicas |
| 387 | should also serve this data using Gerrit to ensure that permissions |
| 388 | are obeyed. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 389 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 390 | The API serves requests of varying costs. Requests that originate in |
| 391 | the UI can block productivity, so care has been taken to optimize |
| 392 | these for latency, using the following techniques: |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 393 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 394 | * Async calls: the UI becomes responsive before some UI elements |
| 395 | finished loading |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 396 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 397 | * Caching: metadata is stored in Git, which is relatively expensive to |
| 398 | access. This is sped up by multiple caches. Metadata entities are |
| 399 | stored in Git, and can therefore be seen as immutable values keyed |
Han-Wen Nienhuys | 37a1cab | 2021-04-01 12:46:00 +0200 | [diff] [blame] | 400 | by SHA-1, which is very amenable to caching. All SHA-1 keyed caches |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 401 | can be persisted on local disk. |
| 402 | |
| 403 | The size (memory, disk) of these caches should be adapted to the |
| 404 | instance size (number of users, size and quantity of repositories) |
| 405 | for optimal performance. |
| 406 | |
| 407 | Git does not impose fundamental limits (eg. number of files per |
| 408 | change) on data. To ensure stability, Gerrit configures a number of |
| 409 | default limits for these. |
| 410 | |
| 411 | // add a link to the default settings. |
| 412 | |
| 413 | === Scaling team size |
| 414 | |
| 415 | A team of size N has N^2 possible interactions. As a result, features |
| 416 | that expose interactions with activities of other team members has a |
| 417 | quadratic cost in aggregate. The following features scale poorly with |
| 418 | large team sizes: |
| 419 | |
| 420 | * the change screen shows conflicting changes by default. This data is |
| 421 | cached, but updates to pending changes cause cache misses. For a |
| 422 | single change, the amount of work is proportional to the number of |
| 423 | pending changes, so in aggregate, the cost of this feature is |
| 424 | quadratic in the team size. |
| 425 | |
| 426 | * the change screen shows if a change is mergeable to the target |
| 427 | branch. If the target branch moves quickly (large developer team), |
| 428 | this causes cache misses. In aggregate, the cost of this feature is |
| 429 | also quadratic. |
| 430 | |
| 431 | Both features should be turned off for repositories that involve 1000s |
| 432 | of developers. |
| 433 | |
| 434 | === Browser performance |
| 435 | |
| 436 | // say something about browser performance tuning. |
| 437 | |
| 438 | === Real life numbers |
| 439 | |
| 440 | |
| 441 | Gerrit is designed for very large projects, both open source and |
| 442 | proprietary commercial projects. For a single Gerrit process, the |
| 443 | following limits are known to work: |
| 444 | |
| 445 | .Observed maximums |
| 446 | [options="header"] |
| 447 | |====================================================== |
| 448 | |Parameter | Maximum | Deployment |
| 449 | |Projects | 50,000 | gerrithub.io |
| 450 | |Contributors | 150,000 | eclipse.org |
| 451 | |Bytes/repo | 100G | Qualcomm internal |
| 452 | |Changes/repo | 300k | Qualcomm internal |
| 453 | |Revisions/Change | 300 | Qualcomm internal |
| 454 | |Reviewers/Change | 87 | Qualcomm internal |
| 455 | |====================================================== |
| 456 | |
| 457 | |
| 458 | // find some numbers for these stats: |
| 459 | // |Files/repo | ? | |
| 460 | // |Files/Change | ? | |
| 461 | // |Comments/Change | ? | |
| 462 | // |max QPS/CPU | ? | |
| 463 | |
| 464 | |
| 465 | Google runs a horizontally scaled deployment. We have seen the |
| 466 | following per-JVM maximums: |
| 467 | |
| 468 | .Observed maximums (googlesource.com) |
| 469 | [options="header"] |
| 470 | |====================================================== |
| 471 | |Parameter | Maximum | Deployment |
| 472 | |Files/repo | 500,000 | chromium-review |
| 473 | |Bytes/repo | 12G | chromium-review |
| 474 | |Changes/repo | 500k | chromium-review |
| 475 | |Revisions/Change | 1900 | chromium-review |
| 476 | |Files/Change | 10,000| android-review |
| 477 | |Comments/Change | 1,200 | chromium-review |
| 478 | |====================================================== |
| 479 | |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 480 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 481 | == Redundancy & Reliability |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 482 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 483 | Gerrit is structured as a single JVM process, reading and writing to a |
| 484 | single file system. If there are hardware failures in the machine |
| 485 | running the JVM, or the storage holding the repositories, there is no |
| 486 | recourse; on failure, errors will be returned to the client. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 487 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 488 | Deployments needing more stringent uptime guarantees can use |
| 489 | replication/multi-master setup, which ensures availability and |
| 490 | geographical distribution, at the cost of slower write actions. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 491 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 492 | // TODO: link. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 493 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 494 | === Backups |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 495 | |
Shawn O. Pearce | 7d2cb04 | 2012-05-10 19:12:09 -0700 | [diff] [blame] | 496 | Using the standard replication plugin, Gerrit can be configured |
| 497 | to replicate changes made to the local Git repositories over any |
| 498 | standard Git transports. After the plugin is installed, remote |
| 499 | destinations can be configured in `'$site_path'/etc/replication.conf` |
| 500 | to send copies of all changes over SSH to other servers, or to the |
| 501 | Amazon S3 blob storage service. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 502 | |
| 503 | |
Yuxuan 'fishy' Wang | 61698b1 | 2013-12-20 12:55:51 -0800 | [diff] [blame] | 504 | == Logging Plan |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 505 | |
Han-Wen Nienhuys | d7873e6 | 2021-02-24 18:41:00 +0100 | [diff] [blame] | 506 | Gerrit stores Apache style HTTPD logs, as well as ERROR/INFO messages |
| 507 | from the Java logger, under `$site_dir/logs/`. |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 508 | |
| 509 | Published comments contain a publication date, so users can judge |
| 510 | when the comment was posted and decide if it was "recent" or not. |
| 511 | Only the timestamp is stored in the database, the IP address of |
| 512 | the comment author is not stored. |
| 513 | |
| 514 | Changes uploaded over the SSH daemon from `git push` have the |
| 515 | standard Git reflog updated with the date and time that the upload |
| 516 | occurred, and the Gerrit account identity of who did the upload. |
| 517 | Changes submitted and merged into a branch also update the |
| 518 | Git reflog. These logs are available only to the Gerrit site |
| 519 | administrator, and they are not replicated through the automatic |
David Pursehouse | 9246356 | 2013-06-24 10:16:28 +0900 | [diff] [blame] | 520 | replication noted earlier. These logs are primarily recorded for an |
Shawn O. Pearce | c4bcc09 | 2009-02-06 12:32:57 -0800 | [diff] [blame] | 521 | "oh s**t" moment where the administrator has to rewind data. In most |
| 522 | installations they are a waste of disk space. Future versions of |
| 523 | JGit may allow disabling these logs, and Gerrit may take advantage |
| 524 | of that feature to stop writing these logs. |
| 525 | |
| 526 | A web server positioned in front of Gerrit (such as a reverse proxy) |
| 527 | or the hosting servlet container may record access logs, and these |
| 528 | logs may be mined for usage information. This is outside of the |
| 529 | scope of Gerrit. |
| 530 | |
| 531 | |
Shawn O. Pearce | 5500e69 | 2009-05-28 15:55:01 -0700 | [diff] [blame] | 532 | GERRIT |
| 533 | ------ |
| 534 | Part of link:index.html[Gerrit Code Review] |
Yuxuan 'fishy' Wang | 99cb68d | 2013-10-31 17:26:00 -0700 | [diff] [blame] | 535 | |
| 536 | SEARCHBOX |
| 537 | --------- |