| --- |
| title: " Gerrit Code Review - System Design" |
| sidebar: gerritdoc_sidebar |
| permalink: dev-design.html |
| --- |
| ## Objective |
| |
| Gerrit is a web based code review system, facilitating online code |
| reviews for projects using the Git version control system. |
| |
| Gerrit makes reviews easier by showing changes in a side-by-side |
| display, and allowing inline/file comments to be added by any reviewer. |
| |
| Gerrit simplifies Git based project maintainership by permitting any |
| authorized user to submit changes to the master Git repository, rather |
| than requiring all approved changes to be merged in by hand by the |
| project maintainer. This functionality enables a more centralized usage |
| of Git. |
| |
| ## Background |
| |
| Google developed Mondrian, a Perforce based code review tool to |
| facilitate peer-review of changes prior to submission to the central |
| code repository. Mondrian is not open source, as it is tied to the use |
| of Perforce and to many Google-only services, such as Bigtable. Google |
| employees have often described how useful Mondrian and its peer-review |
| process is to their day-to-day work. |
| |
| Guido van Rossum open sourced portions of Mondrian within Rietveld, a |
| similar code review tool running on Google App Engine, but for use with |
| Subversion rather than Perforce. Rietveld is in common use by many open |
| source projects, facilitating their peer reviews much as Mondrian does |
| for Google employees. Unlike Mondrian and the Google Perforce triggers, |
| Rietveld is strictly advisory and does not enforce peer-review prior to |
| submission. |
| |
| Git is a distributed version control system, wherein each repository is |
| assumed to be owned/maintained by a single user. There are no inherent |
| security controls built into Git, so the ability to read from or write |
| to a repository is controlled entirely by the host’s filesystem access |
| controls. When multiple maintainers collaborate on a single shared |
| repository a high degree of trust is required, as any collaborator with |
| write access can alter the repository. |
| |
| Gitosis provides tools to secure centralized Git repositories, |
| permitting multiple maintainers to manage the same project at once, by |
| restricting the access to only over a secure network protocol, much like |
| Perforce secures a repository by only permitting access over its network |
| port. |
| |
| The Android Open Source Project (AOSP) was founded by Google by the open |
| source releasing of the Android operating system. AOSP has selected Git |
| as its primary version control tool. As many of the engineers have a |
| background of working with Mondrian at Google, there is a strong desire |
| to have the same (or better) feature set available for Git and AOSP. |
| |
| Gerrit Code Review started as a simple set of patches to Rietveld, and |
| was originally built to service AOSP. This quickly turned into a fork as |
| we added access control features that Guido van Rossum did not want to |
| see complicating the Rietveld code base. As the functionality and code |
| were starting to become drastically different, a different name was |
| needed. Gerrit calls back to the original namesake of Rietveld, Gerrit |
| Rietveld, a Dutch architect. |
| |
| Gerrit 2.x is a complete rewrite of the Gerrit fork, completely changing |
| the implementation from Python on Google App Engine, to Java on a J2EE |
| servlet container and an SQL database. |
| |
| - [Mondrian Code Review On The |
| Web](http://video.google.com/videoplay?docid=-8502904076440714866) |
| |
| - [Rietveld - Code Review for |
| Subversion](https://github.com/rietveld-codereview/rietveld) |
| |
| - [Gitosis |
| README](http://eagain.net/gitweb/?p=gitosis.git;a=blob;f=README.rst;hb=HEAD) |
| |
| - [Android Open Source Project](http://source.android.com/) |
| |
| ## Overview |
| |
| Developers create one or more changes on their local desktop system, |
| then upload them for review to Gerrit using the standard `git push` |
| command line program, or any GUI which can invoke `git push` on behalf |
| of the user. Authentication and data transfer are handled through SSH. |
| Users are authenticated by username and public/private key pair, and all |
| data transfer is protected by the SSH connection and Git’s own data |
| integrity checks. |
| |
| Each Git commit created on the client desktop system is converted into a |
| unique change record which can be reviewed independently. Change records |
| are stored in a database: PostgreSQL, MySQL, or the built-in H2, where |
| they can be queried to present customized user dashboards, enumerating |
| any pending changes. |
| |
| A summary of each newly uploaded change is automatically emailed to |
| reviewers, so they receive a direct hyperlink to review the change on |
| the web. Reviewer email addresses can be specified on the `git push` |
| command line, but typically reviewers are automatically selected by |
| Gerrit by identifying users who have change approval permissions in the |
| project. |
| |
| Reviewers use the web interface to read the side-by-side or unified diff |
| of a change, and insert draft inline/file comments where appropriate. A |
| draft comment is visible only to the reviewer, until they publish those |
| comments. Published comments are automatically emailed to the change |
| author by Gerrit, and are CC’d to all other reviewers who have already |
| commented on the change. |
| |
| When publishing comments reviewers are also given the opportunity to |
| score the change, indicating whether they feel the change is ready for |
| inclusion in the project, needs more work, or should be rejected |
| outright. These scores provide direct feedback to Gerrit’s change submit |
| function. |
| |
| After a change has been scored positively by reviewers, Gerrit enables a |
| submit button on the web interface. Authorized users can push the submit |
| button to have the change enter the project repository. The equivalent |
| in Subversion or Perforce would be that Gerrit is invoking `svn commit` |
| or `p4 submit` on behalf of the web user pressing the button. Due to the |
| way Git audit trails are maintained, the user pressing the submit button |
| does not need to be the author of the change. |
| |
| ## Infrastructure |
| |
| End-user web browsers make HTTP requests directly to Gerrit’s HTTP |
| server. As nearly all of the user interface is implemented through |
| Google Web Toolkit (GWT), the majority of these requests are |
| transmitting compressed JSON payloads, with all HTML being generated |
| within the browser. Most responses are under 1 KB. |
| |
| Gerrit’s HTTP server side component is implemented as a standard Java |
| servlet, and thus runs within any J2EE servlet container. Popular |
| choices for deployments would be Tomcat or Jetty, as these are |
| high-quality open-source servlet containers that are readily available |
| for download. |
| |
| End-user uploads are performed over SSH, so Gerrit’s servlets also start |
| up a background thread to receive SSH connections through an independent |
| SSH port. SSH clients communicate directly with this port, bypassing the |
| HTTP server used by browsers. |
| |
| Server side data storage for Gerrit is broken down into two different |
| categories: |
| |
| - Git repository data |
| |
| - Gerrit metadata |
| |
| The Git repository data is the Git object database used to store already |
| submitted revisions, as well as all uploaded (proposed) changes. Gerrit |
| uses the standard Git repository format, and therefore requires direct |
| filesystem access to the repositories. All repository data is stored in |
| the filesystem and accessed through the JGit library. Repository data |
| can be stored on remote servers accessible through NFS or SMB, but the |
| remote directory must be mounted on the Gerrit server as part of the |
| local filesystem namespace. Remote filesystems are likely to perform |
| worse than local ones, due to Git disk IO behavior not being optimized |
| for remote access. |
| |
| The Gerrit metadata contains a summary of the available changes, all |
| comments (published and drafts), and individual user account |
| information. The metadata is mostly housed in the database (\*1), which |
| can be located either on the same server as Gerrit, or on a different |
| (but nearby) server. Most installations would opt to install both Gerrit |
| and the metadata database on the same server, to reduce administration |
| overheads. |
| |
| User authentication is handled by OpenID, and therefore Gerrit requires |
| that the OpenID provider selected by a user must be online and operating |
| in order to authenticate that user. |
| |
| - [Google Web Toolkit (GWT)](http://www.gwtproject.org/) |
| |
| - [Git Repository |
| Format](http://www.kernel.org/pub/software/scm/git/docs/gitrepository-layout.html) |
| |
| - [About PostgreSQL](http://www.postgresql.org/about/) |
| |
| - [OpenID Specifications](http://openid.net/developers/specs/) |
| |
| \*1 Although an effort is underway to eliminate the use of the database |
| altogether, and to store all the metadata directly in the git |
| repositories themselves. So far, as of Gerrit 2.2.1, of all Gerrit’s |
| metadata, only the project configuration metadata has been migrated out |
| of the database and into the git repositories for each project. |
| |
| ## Project Information |
| |
| Gerrit is developed as a self-hosting open source project: |
| |
| - [Project Homepage](https://www.gerritcodereview.com/) |
| |
| - [Release |
| Versions](https://www.gerritcodereview.com/download/index.html) |
| |
| - [Source](https://gerrit.googlesource.com/gerrit) |
| |
| - [Issue Tracking](https://bugs.chromium.org/p/gerrit/issues/list) |
| |
| - [Change Review](https://review.source.android.com/) |
| |
| ## Internationalization and Localization |
| |
| As a source code review system for open source projects, where the |
| commonly preferred language for communication is typically English, |
| Gerrit does not make internationalization or localization a priority. |
| |
| The majority of Gerrit’s users will be writing change descriptions and |
| comments in English, and therefore an English user interface is usable |
| by the target user base. |
| |
| Gerrit uses GWT’s i18n support to externalize all constant strings and |
| messages shown to the user, so that in the future someone who really |
| needed a translated version of the UI could contribute new string files |
| for their locale(s). |
| |
| Right-to-left (RTL) support is only barely considered within the Gerrit |
| code base. Some portions of the code have tried to take RTL into |
| consideration, while others probably need to be modified before |
| translating the UI to an RTL language. |
| |
| - [Gerrit’s i18n Support](i18n-readme.html) |
| |
| ## Accessibility Considerations |
| |
| Whenever possible Gerrit displays raw text rather than image icons, so |
| screen readers should still be able to provide useful information to |
| blind persons accessing Gerrit sites. |
| |
| Standard HTML hyperlinks are used rather than HTML div or span tags with |
| click listeners. This provides two benefits to the end-user. The first |
| benefit is that screen readers are optimized to locating standard |
| hyperlink anchors and presenting them to the end-user as a navigation |
| action. The second benefit is that users can use the *open in new |
| tab/window* feature of their browser whenever they choose. |
| |
| When possible, Gerrit uses the ARIA properties on DOM widgets to provide |
| hints to screen readers. |
| |
| ## Browser Compatibility |
| |
| Supporting non-JavaScript enabled browsers is a non-goal for Gerrit. |
| |
| As Gerrit is a pure-GWT application with no server side rendering |
| fallbacks, the browser must support modern JavaScript semantics in order |
| to access the Gerrit web application. Dumb clients such as `lynx`, |
| `wget`, `curl`, or even many search engine spiders are not able to |
| access Gerrit content. |
| |
| As Google Web Toolkit (GWT) is used to generate the browser specific |
| versions of the client-side JavaScript code, Gerrit works on any |
| JavaScript enabled browser which GWT can produce code for. This covers |
| the majority of the popular browsers. |
| |
| The Gerrit project does not have the development resources necessary to |
| support two parallel UI implementations (GWT based JavaScript and |
| server-side rendering). Consequently only one is implemented. |
| |
| There are number of web browsers available with full JavaScript support, |
| and nearly every operating system (including any PDA-like mobile phone) |
| comes with one standard. Users who are committed to developing changes |
| for a Gerrit managed project can be expected to be able to run a |
| JavaScript enabled browser, as they also would need to be running Git in |
| order to contribute. |
| |
| There are a number of open source browsers available, including Firefox |
| and Chromium. Users have some degree of choice in their browser |
| selection, including being able to build and audit their browser from |
| source. |
| |
| The majority of the content stored within Gerrit is also available |
| through other means, such as gitweb or the `git://` protocol. Any |
| existing search engine spider can crawl the server-side HTML produced by |
| gitweb, and thus can index the majority of the changes which might |
| appear in Gerrit. Some engines may even choose to crawl the native |
| version control database, such as ohloh.net does. Therefore the lack of |
| support for most search engine spiders is a non-issue for most Gerrit |
| deployments. |
| |
| ## Product Integration |
| |
| Gerrit integrates with an existing gitweb installation by optionally |
| creating hyperlinks to reference changes on the gitweb server. |
| |
| Gerrit integrates with an existing git-daemon installation by optionally |
| displaying `git://` URLs for users to download a change through the |
| native Git protocol. |
| |
| Gerrit integrates with any OpenID provider for user authentication, |
| making it easier for users to join a Gerrit site and manage their |
| authentication credentials to it. To make use of Google Accounts as an |
| OpenID provider easier, Gerrit has a shorthand "Sign in with a Google |
| Account" link on its sign-in screen. Gerrit also supports a shorthand |
| sign in link for Yahoo\!. Other providers may also be supported more |
| directly in the future. |
| |
| Site administrators may limit the range of OpenID providers to a subset |
| of "reliable providers". Users may continue to use any OpenID provider |
| to publish comments, but granted privileges are only available to a user |
| if the only entry point to their account is through the defined set of |
| "reliable OpenID providers". This permits site administrators to require |
| HTTPS for OpenID, and to use only large main-stream providers that are |
| trustworthy, or to require users to only use a custom OpenID provider |
| installed alongside Gerrit Code Review. |
| |
| Gerrit integrates with some types of corporate single-sign-on (SSO) |
| solutions, typically by having the SSO authentication be performed in a |
| reverse proxy web server and then blindly trusting that all incoming |
| connections have been authenticated by that reverse proxy. When |
| configured to use this form of authentication, Gerrit does not integrate |
| with OpenID providers. |
| |
| When installing Gerrit, administrators may optionally include an HTML |
| header or footer snippet which may include user tracking code, such as |
| that used by Google Analytics. This is a per-instance configuration that |
| must be done by hand, and is not supported out of the box. Other site |
| trackers instead of Google Analytics can be used, as the administrator |
| can supply any HTML/JavaScript they choose. |
| |
| Gerrit does not integrate with any Google service, or any other services |
| other than those listed above. |
| |
| ## Standards / Developer APIs |
| |
| Gerrit uses an XSRF protected variant of JSON-RPC 1.1 to communicate |
| between the browser client and the server. |
| |
| As the protocol is not the GWT-RPC protocol, but is instead a |
| self-describing standard JSON format it is easily implemented by any 3rd |
| party client application, provided the client has a JSON parser and HTTP |
| client library available. |
| |
| As the entire command set necessary for the standard web browser based |
| UI is exposed through JSON-RPC over HTTP, there are no other data feeds |
| or command interfaces to the server. |
| |
| Commands requiring user authentication may require the user agent to |
| complete a sign-in cycle through the user’s OpenID provider in order to |
| establish the HTTP cookie Gerrit uses to track user identity. Automating |
| this sign-in process for non-web browser agents is outside of the scope |
| of Gerrit, as each OpenID provider uses its own sign-in sequence. Use of |
| OpenID providers which have difficult to automate interfaces may make it |
| impossible for non-browser agents to be used with the JSON-RPC |
| interface. |
| |
| - [JSON-RPC 1.1](http://json-rpc.org/wd/JSON-RPC-1-1-WD-20060807.html) |
| |
| - [XSRF |
| JSON-RPC](http://code.google.com/p/gerrit/source/browse/README?repo=gwtjsonrpc&name=master) |
| |
| ## Privacy Considerations |
| |
| Gerrit stores the following information per user account: |
| |
| - Full Name |
| |
| - Preferred Email Address |
| |
| - Mailing Address *(Optional, Encrypted)* |
| |
| - Country *(Optional, Encrypted)* |
| |
| - Phone Number *(Optional, Encrypted)* |
| |
| - Fax Number *(Optional, Encrypted)* |
| |
| The full name and preferred email address fields are shown to any site |
| visitor viewing a page containing a change uploaded by the account |
| owner, or containing a published comment written by the account owner. |
| |
| Showing the full name and preferred email is approximately the same risk |
| as the `From` header of an email posted to a public mailing list that |
| maintains archives, and Gerrit treats these fields in much the same way |
| that a mailing list archive might handle them. Users who don’t want to |
| expose this information should either not participate in a Gerrit based |
| online community, or open a new email address dedicated for this use. |
| |
| As the Gerrit UI data is only available through XSRF protected JSON-RPC |
| calls, "screen-scraping" for email addresses is difficult, but not |
| impossible. It is unlikely a spammer will go through the effort required |
| to code a custom scraping application necessary to cull email addresses |
| from published Gerrit comments. In most cases these same addresses would |
| be more easily obtained from the project’s mailing list archives. |
| |
| The user’s name and email address is stored unencrypted in the Gerrit |
| metadata store, typically a PostgreSQL database. |
| |
| The snail-mail mailing address, country, and phone and fax numbers are |
| gathered to help project leads contact the user should there be a legal |
| question regarding any change they have uploaded. |
| |
| These sensitive fields are immediately encrypted upon receipt with a |
| GnuPG public key, and stored "off site" in another data store, isolated |
| from the main Gerrit change data. Gerrit does not have access to the |
| matching private key, and as such cannot decrypt the information. |
| Therefore these fields are write-once in Gerrit, as not even the account |
| owner can recover the values they previously stored. |
| |
| It is expected that the address information would only need to be |
| decrypted and revealed with a valid court subpoena, but this is really |
| left to the discretion of the Gerrit site administrator as to when it is |
| reasonable to reveal this information to a 3rd party. |
| |
| ## Spam and Abuse Considerations |
| |
| Gerrit makes no attempt to detect spam changes or comments. The somewhat |
| high barrier to entry makes it unlikely that a spammer will target |
| Gerrit. |
| |
| To upload a change, the client must speak the native Git protocol |
| embedded in SSH, with some custom Gerrit semantics added on top. The |
| client must have their public key already stored in the Gerrit database, |
| which can only be done through the XSRF protected JSON-RPC interface. |
| The level of effort required to construct the necessary tools to upload |
| a well-formatted change that isn’t rejected outright by the Git and |
| Gerrit checksum validations is too high to for a spammer to get any |
| meaningful return. |
| |
| To post and publish a comment a client must sign in with an OpenID |
| provider and then use the XSRF protected JSON-RPC interface to publish |
| the draft on an existing change record. Again, the level of effort |
| required to implement the Gerrit specific XSRF protections and the |
| JSON-RPC payload format necessary to post a draft and then publish that |
| draft is simply too high for a spammer to bother with. |
| |
| Both of these assumptions are also based upon the idea that Gerrit will |
| be a lot less popular than blog software, and thus will be running on a |
| lot fewer websites. Spammers therefore have very little returned benefit |
| for getting over the protocol hurdles. |
| |
| These assumptions may need to be revisited in the future if any public |
| Gerrit site actually notices spam. |
| |
| ## Latency |
| |
| Gerrit targets for sub-250 ms per page request, mostly by using very |
| compact JSON payloads between client and server. However, as most of the |
| serving stack (network, hardware, metadata database) is out of control |
| of the Gerrit developers, no real guarantees can be made about latency. |
| |
| ## Scalability |
| |
| Gerrit is designed for a very large scale open source project, or large |
| commercial development project. Roughly this amounts to parameters such |
| as the following: |
| |
| <table> |
| <caption>Design Parameters</caption> |
| <colgroup> |
| <col width="33%" /> |
| <col width="33%" /> |
| <col width="33%" /> |
| </colgroup> |
| <thead> |
| <tr class="header"> |
| <th>Parameter</th> |
| <th>Default Maximum</th> |
| <th>Estimated Maximum</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr class="odd"> |
| <td><p>Projects</p></td> |
| <td><p>1,000</p></td> |
| <td><p>10,000</p></td> |
| </tr> |
| <tr class="even"> |
| <td><p>Contributors</p></td> |
| <td><p>1,000</p></td> |
| <td><p>50,000</p></td> |
| </tr> |
| <tr class="odd"> |
| <td><p>Changes/Day</p></td> |
| <td><p>100</p></td> |
| <td><p>2,000</p></td> |
| </tr> |
| <tr class="even"> |
| <td><p>Revisions/Change</p></td> |
| <td><p>20</p></td> |
| <td><p>20</p></td> |
| </tr> |
| <tr class="odd"> |
| <td><p>Files/Change</p></td> |
| <td><p>50</p></td> |
| <td><p>16,000</p></td> |
| </tr> |
| <tr class="even"> |
| <td><p>Comments/File</p></td> |
| <td><p>100</p></td> |
| <td><p>100</p></td> |
| </tr> |
| <tr class="odd"> |
| <td><p>Reviewers/Change</p></td> |
| <td><p>8</p></td> |
| <td><p>8</p></td> |
| </tr> |
| </tbody> |
| </table> |
| |
| Out of the box, Gerrit will handle the "Default Maximum". Site |
| administrators may reconfigure their servers by editing gerrit.config to |
| run closer to the estimated maximum if sufficient memory is made |
| available to the JVM and the relevant cache.\*.memoryLimit variables are |
| increased from their defaults. |
| |
| ### Discussion |
| |
| Very few, if any open source projects have more than a handful of Git |
| repositories associated with them. Since Gerrit treats each Git |
| repository as a project, an upper limit of 10,000 projects is |
| reasonable. If a site has more than 1,000 projects, administrators |
| should increase |
| [`cache.projects.memoryLimit`](config-gerrit.html#cache.name.memoryLimit) |
| to match. |
| |
| Almost no open source project has 1,000 contributors over all time, let |
| alone on a daily basis. This default figure of 1,000 was WAG’d by |
| looking at PR statements published by cell phone companies picking up |
| the Android operating system. If all of the stated employees in those PR |
| statements were working on **only** the open source Android |
| repositories, we might reach the 1,000 estimate listed here. Knowing |
| these companies as being very closed-source minded in the past, it is |
| very unlikely all of their Android engineers will be working on the open |
| source repository, and thus 1,000 is a very high estimate. |
| |
| The upper maximum of 50,000 contributors is based on existing |
| installations that are already handling quite a bit more than the |
| default maximum of 1,000 contributors. Given how the user data is stored |
| and indexed, supporting 50,000 contributor accounts (or more) is easily |
| possible for a server. If a server has more than 1,000 **active** |
| contributors, |
| [`cache.accounts.memoryLimit`](config-gerrit.html#cache.name.memoryLimit) |
| should be increased by the site administrator, if sufficient RAM is |
| available to the host JVM. |
| |
| The estimate of 100 changes per day was WAG’d off some estimates |
| originally obtained from Android’s development history. Writing a good |
| change that will be accepted through a peer-review process takes time. |
| The average engineer may need 4-6 hours per change just to write the |
| code and unit tests. Proper design consideration and additional but |
| equally important tasks such as meetings, interviews, training, and |
| eating lunch will often pad the engineer’s day out such that suitable |
| changes are only posted once a day, or once every other day. For |
| reference, the entire Linux kernel has an average of only 79 |
| changes/day. If more than 100 changes are active per day, site |
| administrators should consider increasing the |
| [`cache.diff.memoryLimit`](config-gerrit.html#cache.name.memoryLimit) |
| and `cache.diff_intraline.memoryLimit`. |
| |
| On average any given change will need to be modified once to address |
| peer review comments before the final revision can be accepted by the |
| project. Executing these revisions also eats into the contributor’s |
| time, and is another factor limiting the number of changes/day accepted |
| by the Gerrit instance. However, even though this implies only 2 |
| revisions/change, many existing Gerrit installations have seen 20 or |
| more revisions/change, when new contributors are learning the project’s |
| style and conventions. |
| |
| On average, each change will have 2 reviewers, a human and an automated |
| test bed system. Usually this would be the project lead, or someone who |
| is familiar with the code being modified. The time required to comment |
| further reduces the time available for writing one’s own changes. |
| However, existing Gerrit installations have seen 8 or more reviewers |
| frequently show up on changes that impact many functional areas, and |
| therefore it is reasonable to expect 8 or more reviewers to be able to |
| work together on a single change. |
| |
| Existing installations have successfully processed change reviews with |
| more than 16,000 files per change. However, since 16,000 modified/new |
| files is a massive amount of code to review, it is more typical to see |
| less than 10 files modified in any single change. Changes larger than 10 |
| files are typically merges, for example integrating the latest version |
| of an upstream library, where the reviewer has little to do beyond |
| verifying the project compiles and passes a test suite. |
| |
| ### CPU Usage - Web UI |
| |
| Gerrit’s web UI would require on average `4+F+F*C` HTTP requests to |
| review a change and post comments. Here `F` is the number of files |
| modified by the change, and `C` is the number of inline/file comments |
| left by the reviewer per file. The constant 4 accounts for the request |
| to load the reviewer’s dashboard, to load the change detail page, to |
| publish the review comments, and to reload the change detail page after |
| comments are published. |
| |
| This WAG’d estimate boils down to 216,000 HTTP requests per day (QPD). |
| Assuming these are evenly distributed over an 8 hour work day in a |
| single time zone, we are looking at approximately 7.5 queries per second |
| (QPS). |
| |
| ``` |
| QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 + F + F * C) |
| = 2,000 * 2 * 1 * (4 + 10 + 10 * 4) |
| = 216,000 |
| QPS = QPD / 8_Hours / 60_Minutes / 60_Seconds |
| = 7.5 |
| ``` |
| |
| Gerrit serves most requests in under 60 ms when using the loopback |
| interface and a single processor. On a single CPU system there is |
| sufficient capacity for 16 QPS. A dual processor system should be more |
| than sufficient for a site with the estimated load described above. |
| |
| Given a more realistic estimate of 79 changes per day (from the Linux |
| kernel) suggests only 8,532 queries per day, and a much lower 0.29 QPS |
| when spread out over an 8 hour work day. |
| |
| ### CPU Usage - Git over SSH/HTTP |
| |
| A 24 core server is able to handle ~25 concurrent `git fetch` operations |
| per second. The issue here is each concurrent operation demands one full |
| core, as the computation is almost entirely server side CPU bound. 25 |
| concurrent operations is known to be sufficient to support hundreds of |
| active developers and 50 automated build servers polling for updates and |
| building every change. (This data was derived from an actual |
| installation’s performance.) |
| |
| Because of the distributed nature of Git, end-users don’t need to |
| contact the central Gerrit Code Review server very often. For `git |
| fetch` traffic, [slave mode](pgm-daemon.html) is known to be an |
| effective way to offload traffic from the main server, permitting it to |
| scale to a large user base without needing an excessive number of cores |
| in a single system. |
| |
| Clients on very slow network connections (for example home office users |
| on VPN over home DSL) may be network bound rather than server side CPU |
| bound, in which case a core may be effectively shared with another user. |
| Possible core sharing due to network bottlenecks generally holds true |
| for network connections running below 10 MiB/sec. |
| |
| If the server’s own network interface is 1 Gib/sec (Gigabit Ethernet), |
| the system can really only serve about 10 concurrent clients at the 10 |
| MiB/sec speed, no matter how many cores it has. |
| |
| ### Disk Usage |
| |
| The average size of a revision in the Linux kernel once compressed by |
| Git is 2,327 bytes, or roughly 2 KiB. Over the course of a year a Gerrit |
| server running with the estimated maximum parameters above might see an |
| introduction of 1.4 GiB over the total set of 10,000 projects hosted in |
| that server. This figure assumes the majority of the content is human |
| written source code, and not large binary blobs such as disk images or |
| media files. |
| |
| Production Gerrit installations have been tested, and are known to |
| handle Git repositories in the multigigabyte range, storing binary |
| files, ranging in size from a few kilobytes (for example compressed |
| icons) to 800+ megabytes (firmware images, large uncompressed original |
| artwork files). Best practices encourage breaking very large binary |
| files into their Git repositories based on access, to prevent desktop |
| clients from needing to clone unnecessary materials (for example a C |
| developer does not need every 800+ megabyte firmware image created by |
| the product’s quality assurance team). |
| |
| ## Redundancy & Reliability |
| |
| Gerrit largely assumes that the local filesystem where Git repository |
| data is stored is always available. Important data written to disk is |
| also forced to the platter with an `fsync()` once it has been fully |
| written. If the local filesystem fails to respond to reads or becomes |
| corrupt, Gerrit has no provisions to fallback or retry and errors will |
| be returned to clients. |
| |
| Gerrit largely assumes that the metadata database is online and |
| answering both read and write queries. Query failures immediately result |
| in the operation aborting and errors being returned to the client, with |
| no retry or fallback provisions. |
| |
| Due to the relatively small scale described above, it is very likely |
| that the Git filesystem and metadata database are all housed on the same |
| server that is running Gerrit. If any failure arises in one of these |
| components, it is likely to manifest in the others too. It is also |
| likely that the administrator cannot be bothered to deploy a cluster of |
| load-balanced server hardware, as the scale and expected load does not |
| justify the hardware or management costs. |
| |
| Most deployments caring about reliability will setup a warm-spare |
| standby system and use a manual fail-over process to switch from the |
| failed system to the warm-spare. |
| |
| As Git is a distributed version control system, and open source projects |
| tend to have contributors from all over the world, most contributors |
| will be able to tolerate a Gerrit down time of several hours while the |
| administrator is notified, signs on, and brings the warm-spare up. |
| Pending changes are likely to need at least 24 hours of time on the |
| Gerrit site anyway in order to ensure any interested parties around the |
| world have had a chance to comment. This expected lag largely allows for |
| some downtime in a disaster scenario. |
| |
| ### Backups |
| |
| PostgreSQL and MySQL can be configured to replicate their data to other |
| systems, where they are applied to a warm-standby backup in real time. |
| Gerrit instances which care about redundancy will setup this feature of |
| PostgreSQL or MySQL to ensure the warm-standby is reasonably current |
| should the master go offline. |
| |
| Using the standard replication plugin, Gerrit can be configured to |
| replicate changes made to the local Git repositories over any standard |
| Git transports. After the plugin is installed, remote destinations can |
| be configured in `'$site_path'/etc/replication.conf` to send copies of |
| all changes over SSH to other servers, or to the Amazon S3 blob storage |
| service. |
| |
| ## Logging Plan |
| |
| Gerrit does not maintain logs on its own. |
| |
| Published comments contain a publication date, so users can judge when |
| the comment was posted and decide if it was "recent" or not. Only the |
| timestamp is stored in the database, the IP address of the comment |
| author is not stored. |
| |
| Changes uploaded over the SSH daemon from `git push` have the standard |
| Git reflog updated with the date and time that the upload occurred, and |
| the Gerrit account identity of who did the upload. Changes submitted and |
| merged into a branch also update the Git reflog. These logs are |
| available only to the Gerrit site administrator, and they are not |
| replicated through the automatic replication noted earlier. These logs |
| are primarily recorded for an "oh s\*\*t" moment where the administrator |
| has to rewind data. In most installations they are a waste of disk |
| space. Future versions of JGit may allow disabling these logs, and |
| Gerrit may take advantage of that feature to stop writing these logs. |
| |
| A web server positioned in front of Gerrit (such as a reverse proxy) or |
| the hosting servlet container may record access logs, and these logs may |
| be mined for usage information. This is outside of the scope of Gerrit. |
| |
| ## Testing Plan |
| |
| Gerrit is currently manually tested through its web UI. |
| |
| JGit has a fairly extensive automated unit test suite. Most new changes |
| to JGit are rejected unless corresponding automated unit tests are |
| included. |
| |
| ## Caveats |
| |
| Rietveld can’t be used as it does not provide the "submit over the web" |
| feature that Gerrit provides for Git. |
| |
| Gitosis can’t be used as it does not provide any code review features, |
| but it does provide basic access controls. |
| |
| Email based code review does not scale to a project as large and complex |
| as Android. Most contributors at least need some sort of dashboard to |
| keep track of any pending reviews, and some way to correlate updated |
| revisions back to the comments written on prior revisions of the same |
| logical change. |
| |
| ## GERRIT |
| |
| Part of [Gerrit Code Review](index.html) |
| |
| ## SEARCHBOX |
| |