| Gerrit Code Review - System Design |
| ================================== |
| |
| Objective |
| --------- |
| |
| Gerrit is a web based code review system, facilitating online code |
| reviews for projects using the Git version control system. |
| |
| Gerrit makes reviews easier by showing changes in a side-by-side |
| display, and allowing inline comments to be added by any reviewer. |
| |
| Gerrit simplifies Git based project maintainership by permitting |
| any authorized user to submit changes to the master Git repository, |
| rather than requiring all approved changes to be merged in by |
| hand by the project maintainer. This functionality enables a more |
| centralized usage of Git. |
| |
| |
| Background |
| ---------- |
| |
| Google developed Mondrian, a Perforce based code review tool to |
| facilitate peer-review of changes prior to submission to the central |
| code repository. Mondrian is not open source, as it is tied to the |
| use of Perforce and to many Google-only services, such as Bigtable. |
| Google employees have often described how useful Mondrian and its |
| peer-review process is to their day-to-day work. |
| |
| Guido van Rossum open sourced portions of Mondrian within Rietveld, |
| a similar code review tool running on Google App Engine, but for |
| use with Subversion rather than Perforce. Rietveld is in common |
| use by many open source projects, facilitating their peer reviews |
| much as Mondrian does for Google employees. Unlike Mondrian and |
| the Google Perforce triggers, Rietveld is strictly advisory and |
| does not enforce peer-review prior to submission. |
| |
| Git is a distributed version control system, wherein each repository |
| is assumed to be owned/maintained by a single user. There are no |
| inherit security controls built into Git, so the ability to read |
| from or write to a repository is controlled entirely by the host's |
| filesystem access controls. When multiple maintainers collaborate |
| on a single shared repository a high degree of trust is required, |
| as any collaborator with write access can alter the repository. |
| |
| Gitosis provides tools to secure centralized Git repositories, |
| permitting multiple maintainers to manage the same project at once, |
| by restricting the access to only over a secure network protocol, |
| much like Perforce secures a repository by only permitting access |
| over its network port. |
| |
| The Android Open Source Project (AOSP) was founded by Google by the |
| open source releasing of the Android operating system. AOSP has |
| selected Git as its primary version control tool. As many of the |
| engineers have a background of working with Mondrian at Google, |
| there is a strong desire to have the same (or better) feature set |
| available for Git and AOSP. |
| |
| Gerrit Code Review started as a simple set of patches to Rietveld, |
| and was originally built to service AOSP. This quickly turned |
| into a fork as we added access control features that Guido van |
| Rossum did not want to see complicating the Rietveld code base. As |
| the functionality and code were starting to become drastically |
| different, a different name was needed. Gerrit calls back to the |
| original namesake of Rietveld, Gerrit Rietveld, a Dutch architect. |
| |
| Gerrit 2.x is a complete rewrite of the Gerrit fork, completely |
| changing the implementation from Python on Google App Engine, to Java |
| on a J2EE servlet container and a SQL database. |
| |
| * link:http://video.google.com/videoplay?docid=-8502904076440714866[Mondrian Code Review On The Web] |
| * link:http://code.google.com/p/rietveld/[Rietveld - Code Review for Subversion] |
| * link:http://eagain.net/gitweb/?p=gitosis.git;a=blob;f=README.rst;hb=HEAD[Gitosis README] |
| * link:http://source.android.com/[Android Open Source Project] |
| |
| |
| Overview |
| -------- |
| |
| Developers create one or more changes on their local desktop system, |
| then upload them for review to Gerrit using the standard `git push` |
| command line program, or any GUI which can invoke `git push` on |
| behalf of the user. Authentication and data transfer are handled |
| through SSH. Users are authenticated by username and public/private |
| key pair, and all data transfer is protected by the SSH connection |
| and Git's own data integrity checks. |
| |
| Each Git commit created on the client desktop system is converted |
| into a unique change record which can be reviewed independently. |
| Change records are stored in PostgreSQL, where they can be queried to |
| present customized user dashboards, enumerating any pending changes. |
| |
| A summary of each newly uploaded change is automatically emailed |
| to reviewers, so they receive a direct hyperlink to review the |
| change on the web. Reviewer email addresses can be specified on the |
| `git push` command line, but typically reviewers are automatically |
| selected by Gerrit by identifying users who have change approval |
| permissions in the project. |
| |
| Reviewers use the web interface to read the side-by-side or unified |
| diff of a change, and insert draft inline comments where appropriate. |
| A draft comment is visible only to the reviewer, until they publish |
| those comments. Published comments are automatically emailed to |
| the change author by Gerrit, and are CC'd to all other reviewers |
| who have already commented on the change. |
| |
| When publishing comments reviewers are also given the opportunity |
| to score the change, indicating whether they feel the change is |
| ready for inclusion in the project, needs more work, or should be |
| rejected outright. These scores provide direct feedback to Gerrit's |
| change submit function. |
| |
| After a change has been scored positively by reviewers, Gerrit |
| enables a submit button on the web interface. Authorized users |
| can push the submit button to have the change enter the project |
| repository. The equivilant in Subversion or Perforce would be |
| that Gerrit is invoking `svn commit` or `p4 submit` on behalf of |
| the web user pressing the button. Due to the way Git audit trails |
| are maintained, the user pressing the submit button does not need |
| to be the author of the change. |
| |
| |
| Infrastructure |
| -------------- |
| |
| End-user web browsers make HTTP requests directly to Gerrit's |
| HTTP server. As nearly all of the user interface is implemented |
| through Google Web Toolkit (GWT), the majority of these requests |
| are transmitting compressed JSON payloads, with all HTML being |
| generated within the browser. Most responses are under 1 KB. |
| |
| Gerrit's HTTP server side component is implemented as a standard |
| Java servlet, and thus runs within any J2EE servlet container. |
| Popular choices for deployments would be Tomcat or Jetty, as these |
| are high-quality open-source servlet containers that are readily |
| available for download. |
| |
| End-user uploads are performed over SSH, so Gerrit's servlets also |
| start up a background thread to receive SSH connections through |
| an independent SSH port. SSH clients communicate directly with |
| this port, bypassing the HTTP server used by browsers. |
| |
| Server side data storage for Gerrit is broken down into two different |
| categories: |
| |
| * Git repository data |
| * Gerrit metadata |
| |
| The Git repository data is the Git object database used to store |
| already submitted revisions, as well as all uploaded (proposed) |
| changes. Gerrit uses the standard Git repository format, and |
| therefore requires direct filesystem access to the repositories. |
| All repository data is stored in the filesystem and accessed through |
| the JGit library. Repository data can be stored on remote servers |
| accessible through NFS or SMB, but the remote directory must |
| be mounted on the Gerrit server as part of the local filesystem |
| namespace. Remote filesystems are likely to perform worse than |
| local ones, due to Git disk IO behavior not being optimized for |
| remote access. |
| |
| The Gerrit metadata contains a summary of the available changes, |
| all comments (published and drafts), and individual user account |
| information. The metadata is housed in a PostgreSQL database, |
| which can be located either on the same server as Gerrit, or on |
| a different (but nearby) server. Most installations would opt to |
| install both Gerrit and PostgreSQL on the same server, to reduce |
| administration overheads. |
| |
| User authentication is handled by OpenID, and therefore Gerrit |
| requires that the OpenID provider selected by a user must be |
| online and operating in order to authenticate that user. |
| |
| * link:http://code.google.com/webtoolkit/[Google Web Toolkit (GWT)] |
| * link:http://www.kernel.org/pub/software/scm/git/docs/gitrepository-layout.html[Git Repository Format] |
| * link:http://www.postgresql.org/about/[About PostgreSQL] |
| * link:http://openid.net/developers/specs/[OpenID Specifications] |
| |
| |
| Project Information |
| ------------------- |
| |
| Gerrit is developed as a self-hosting open source project: |
| |
| * link:http://code.google.com/p/gerrit/[Project Homepage] |
| * link:http://code.google.com/p/gerrit/downloads/list[Release Versions] |
| * link:http://code.google.com/p/gerrit/wiki/Source?tm=4[Source] |
| * link:http://code.google.com/p/gerrit/issues/list[Issue Tracking] |
| * link:https://review.source.android.com/[Change Review] |
| |
| |
| Internationalization and Localization |
| ------------------------------------- |
| |
| As a source code review system for open source projects, where the |
| commonly preferred language for communication is typically English, |
| Gerrit does not make internationalization or localization a priority. |
| |
| The majority of Gerrit's users will be writing change descriptions |
| and comments in English, and therefore an English user interface |
| is usable by the target user base. |
| |
| Gerrit uses GWT's i18n support to externalize all constant strings |
| and messages shown to the user, so that in the future someone who |
| really needed a translated version of the UI could contribute new |
| string files for their locale(s). |
| |
| Right-to-left (RTL) support is only barely considered within the |
| Gerrit code base. Some portions of the code have tried to take |
| RTL into consideration, while others probably need to be modified |
| before translating the UI to an RTL language. |
| |
| * link:i18n-readme.html[Gerrit's i18n Support] |
| |
| |
| Accessibility Considerations |
| ---------------------------- |
| |
| Whenever possible Gerrit displays raw text rather than image icons, |
| so screen readers should still be able to provide useful information |
| to blind persons accessing Gerrit sites. |
| |
| Standard HTML hyperlinks are used rather than HTML div or span tags |
| with click listeners. This provides two benefits to the end-user. |
| The first benefit is that screen readers are optimized to locating |
| standard hyperlink anchors and presenting them to the end-user as |
| a navigation action. The second benefit is that users can use |
| the 'open in new tab/window' feature of their browser whenever |
| they choose. |
| |
| When possible, Gerrit uses the ARIA properties on DOM widgets to |
| provide hints to screen readers. |
| |
| |
| Browser Compatibility |
| --------------------- |
| |
| Supporting non-JavaScript enabled browsers is a non-goal for Gerrit. |
| |
| As Gerrit is a pure-GWT application with no server side rendering |
| fallbacks, the browser must support modern JavaScript semantics in |
| order to access the Gerrit web application. Dumb clients such as |
| `lynx`, `wget`, `curl`, or even many search engine spiders are not |
| able to access Gerrit content. |
| |
| As Google Web Toolkit (GWT) is used to generate the browser |
| specific versions of the client-side JavaScript code, Gerrit works |
| on any JavaScript enabled browser which GWT can produce code for. |
| This covers the majority of the popular browsers. |
| |
| The Gerrit project does not have the development resources necessary |
| to support two parallel UI implementations (GWT based JavaScript |
| and server-side rendering). Consequently only one is implemented. |
| |
| There are number of web browsers available with full JavaScript |
| support, and nearly every operating system (including any PDA-like |
| mobile phone) comes with one standard. Users who are committed |
| to developing changes for a Gerrit managed project can be expected |
| to be able to run a JavaScript enabled browser, as they also would |
| need to be running Git in order to contribute. |
| |
| There are a number of open source browsers available, including |
| Firefox and Chromium. Users have some degree of choice in their |
| browser selection, including being able to build and audit their |
| browser from source. |
| |
| The majority of the content stored within Gerrit is also available |
| through other means, such as gitweb or the `git://` protocol. |
| Any existing search engine spider can crawl the server-side HTML |
| produced by gitweb, and thus can index the majority of the changes |
| which might appear in Gerrit. Some engines may even choose to |
| crawl the native version control database, such as ohloh.net does. |
| Therefore the lack of support for most search engine spiders is a |
| non-issue for most Gerrit deployments. |
| |
| |
| Product Integration |
| ------------------- |
| |
| Gerrit integrates with an existing gitweb installation by optionally |
| creating hyperlinks to reference changes on the gitweb server. |
| |
| Gerrit integrates with an existing git-daemon installation by |
| optionally displaying `git://` URLs for users to download a |
| change through the native Git protocol. |
| |
| Gerrit integrates with any OpenID provider for user authentication, |
| making it easier for users to join a Gerrit site and manage their |
| authentication credentials to it. To make use of Google Accounts |
| as an OpenID provider easier, Gerrit has a shorthand "Sign in with |
| a Google Account" link on its sign-in screen. Gerrit also supports |
| a shorthand sign in link for Yahoo!. Other providers may also be |
| supported more directly in the future. |
| |
| Site administrators may limit the range of OpenID providers to |
| a subset of "reliable providers". Users may continue to use |
| any OpenID provider to publish comments, but granted privileges |
| are only available to a user if the only entry point to their |
| account is through the defined set of "reliable OpenID providers". |
| This permits site administrators to require HTTPS for OpenID, |
| and to use only large main-stream providers that are trustworthy, |
| or to require users to only use a custom OpenID provider installed |
| alongside Gerrit Code Review. |
| |
| Gerrit integrates with some types of corporate single-sign-on (SSO) |
| solutions, typically by having the SSO authentication be performed |
| in a reverse proxy web server and then blindly trusting that all |
| incoming connections have been authenticated by that reverse proxy. |
| When configured to use this form of authentication, Gerrit does |
| not integrate with OpenID providers. |
| |
| When installing Gerrit, administrators may optionally include an |
| HTML header or footer snippet which may include user tracking code, |
| such as that used by Google Analytics. This is a per-instance |
| configuration that must be done by hand, and is not supported |
| out of the box. Other site trackers instead of Google Analytics |
| can be used, as the administrator can supply any HTML/JavaScript |
| they choose. |
| |
| Gerrit does not integrate with any Google service, or any other |
| services other than those listed above. |
| |
| |
| Standards / Developer APIs |
| -------------------------- |
| |
| Gerrit uses an XSRF protected variant of JSON-RPC 1.1 to communicate |
| between the browser client and the server. |
| |
| As the protocol is not the GWT-RPC protocol, but is instead a |
| self-describing standard JSON format it is easily implemented by |
| any 3rd party client application, provided the client has a JSON |
| parser and HTTP client library available. |
| |
| As the entire command set necessary for the standard web browser |
| based UI is exposed through JSON-RPC over HTTP, there are no other |
| data feeds or command interfaces to the server. |
| |
| Commands requiring user authentication may require the user agent to |
| complete a sign-in cycle through the user's OpenID provider in order |
| to establish the HTTP cookie Gerrit uses to track user identity. |
| Automating this sign-in process for non-web browser agents is |
| outside of the scope of Gerrit, as each OpenID provider uses its own |
| sign-in sequence. Use of OpenID providers which have difficult to |
| automate interfaces may make it impossible for non-browser agents |
| to be used with the JSON-RPC interface. |
| |
| * link:http://json-rpc.org/wd/JSON-RPC-1-1-WD-20060807.html[JSON-RPC 1.1] |
| * link:http://code.google.com/p/gerrit/source/browse/README?repo=gwtjsonrpc&name=master[XSRF JSON-RPC] |
| |
| |
| Privacy Considerations |
| ---------------------- |
| |
| Gerrit stores the following information per user account: |
| |
| * Full Name |
| * Preferred Email Address |
| * Mailing Address '(Optional, Encrypted)' |
| * Country '(Optional, Encrypted)' |
| * Phone Number '(Optional, Encrypted)' |
| * Fax Number '(Optional, Encrypted)' |
| |
| The full name and preferred email address fields are shown to any |
| site visitor viewing a page containing a change uploaded by the |
| account owner, or containing a published comment written by the |
| account owner. |
| |
| Showing the full name and preferred email is approximately the same |
| risk as the `From` header of an email posted to a public mailing |
| list that maintains archives, and Gerrit treats these fields in |
| much the same way that a mailing list archive might handle them. |
| Users who don't want to expose this information should either not |
| participate in a Gerrit based online community, or open a new email |
| address dedicated for this use. |
| |
| As the Gerrit UI data is only available through XSRF protected |
| JSON-RPC calls, "screen-scraping" for email addresses is difficult, |
| but not impossible. It is unlikely a spammer will go through the |
| effort required to code a custom scraping application necessary |
| to cull email addresses from published Gerrit comments. In most |
| cases these same addresses would be more easily obtained from the |
| project's mailing list archives. |
| |
| The user's name and email address is stored unencrypted in the |
| Gerrit metadata store, typically a PostgreSQL database. |
| |
| The snail-mail mailing address, country, and phone and fax numbers |
| are gathered to help project leads contact the user should there |
| be a legal question regarding any change they have uploaded. |
| |
| These sensitive fields are immediately encrypted upon receipt with |
| a GnuPG public key, and stored "off site" in another data store, |
| isolated from the main Gerrit change data. Gerrit does not have |
| access to the matching private key, and as such cannot decrypt the |
| information. Therefore these fields are write-once in Gerrit, as not |
| even the account owner can recover the values they previously stored. |
| |
| It is expected that the address information would only need to be |
| decrypted and revealed with a valid court subpoena, but this is |
| really left to the discretion of the Gerrit site administrator as |
| to when it is reasonable to reveal this information to a 3rd party. |
| |
| |
| Spam and Abuse Considerations |
| ----------------------------- |
| |
| Gerrit makes no attempt to detect spam changes or comments. The |
| somewhat high barrier to entry makes it unlikely that a spammer |
| will target Gerrit. |
| |
| To upload a change, the client must speak the native Git protocol |
| embedded in SSH, with some custom Gerrit semantics added on top. |
| The client must have their public key already stored in the Gerrit |
| database, which can only be done through the XSRF protected |
| JSON-RPC interface. The level of effort required to construct |
| the necessary tools to upload a well-formatted change that isn't |
| rejected outright by the Git and Gerrit checksum validations is |
| too high to for a spammer to get any meaningful return. |
| |
| To post and publish a comment a client must sign in with an OpenID |
| provider and then use the XSRF protected JSON-RPC interface to |
| publish the draft on an existing change record. Again, the level of |
| effort required to implement the Gerrit specific XSRF protections |
| and the JSON-RPC payload format necessary to post a draft and then |
| publish that draft is simply too high for a spammer to bother with. |
| |
| Both of these assumptions are also based upon the idea that Gerrit |
| will be a lot less popular than blog software, and thus will be |
| running on a lot less websites. Spammers therefore have very little |
| returned benefit for getting over the protocol hurdles. |
| |
| These assumptions may need to be revisited in the future if any |
| public Gerrit site actually notices spam. |
| |
| |
| Latency |
| ------- |
| |
| Gerrit targets for sub-250 ms per page request, mostly by using |
| very compact JSON payloads bewteen client and server. However, as |
| most of the serving stack (network, hardware, PostgreSQL metadata |
| database) is out of control of the Gerrit developers, no real |
| guarantees can be made about latency. |
| |
| |
| Scalability |
| ----------- |
| |
| Gerrit is designed for a very large scale open source project, or |
| large commerical development project. Roughly this amounts to |
| parameters such as the following: |
| |
| .Design Parameters |
| [options="header"] |
| |====================================================== |
| |Parameter | Default Maximum | Estimated Maximum |
| |Projects | 1,000 | 10,000 |
| |Contributors | 1,000 | 50,000 |
| |Changes/Day | 100 | 2,000 |
| |Revisions/Change | 20 | 20 |
| |Files/Change | 50 | 16,000 |
| |Comments/File | 100 | 100 |
| |Reviewers/Change | 8 | 8 |
| |====================================================== |
| |
| Out of the box, Gerrit will handle the "Default Maximum". Site |
| administrators may reconfigure their servers by editing gerrit.config |
| to run closer to the estimated maximum if sufficient memory is made |
| avaliable to the JVM and the relevant cache.*.memoryLimit variables |
| are increased from their defaults. |
| |
| Discussion |
| ~~~~~~~~~~ |
| |
| Very few, if any open source projects have more than a handful of |
| Git repositories associated with them. Since Gerrit treats each |
| Git repository as a project, an upper limit of 10,000 projects |
| is reasonable. If a site has more than 1,000 projects, administrators |
| should increase |
| link:config-gerrit.html#cache.name.memoryLimit[`cache.projects.memoryLimit`] |
| to match. |
| |
| Almost no open source project has 1,000 contributors over all time, |
| let alone on a daily basis. This default figure of 1,000 was WAG'd by |
| looking at PR statements published by cell phone companies picking |
| up the Android operating system. If all of the stated employees in |
| those PR statements were working on *only* the open source Android |
| repositories, we might reach the 1,000 estimate listed here. Knowing |
| these companies as being very closed-source minded in the past, it |
| is very unlikely all of their Android engineers will be working on |
| the open source repository, and thus 1,000 is a very high estimate. |
| |
| The upper maximum of 50,000 contributors is based on existing |
| installations that are already handling quite a bit more than the |
| default maximum of 1,000 contributors. Given how the user data is |
| stored and indexed, supporting 50,000 contributor accounts (or more) |
| is easily possible for a server. If a server has more than 1,000 |
| *active* contributors, |
| link:config-gerrit.html#cache.name.memoryLimit[`cache.accounts.memoryLimit`] |
| should be increased by the site administrator, if sufficient RAM |
| is available to the host JVM. |
| |
| The estimate of 100 changes per day was WAG'd off some estimates |
| originally obtained from Android's development history. Writing a |
| good change that will be accepted through a peer-review process |
| takes time. The average engineer may need 4-6 hours per change just |
| to write the code and unit tests. Proper design consideration and |
| additional but equally important tasks such as meetings, interviews, |
| training, and eating lunch will often pad the engineer's day out |
| such that suitable changes are only posted once a day, or once |
| every other day. For reference, the entire Linux kernel has an |
| average of only 79 changes/day. If more than 100 changes are active |
| per day, site administrators should consider increasing the |
| link:config-gerrit.html#cache.name.memoryLimit[`cache.diff.memoryLimit`] |
| and `cache.diff_intraline.memoryLimit`. |
| |
| On average any given change will need to be modified once to address |
| peer review comments before the final revision can be accepted by the |
| project. Executing these revisions also eats into the contributor's |
| time, and is another factor limiting the number of changes/day |
| accepted by the Gerrit instance. However, even though this implies |
| only 2 revisions/change, many existing Gerrit installations have seen |
| 20 or more revisions/change, when new contributors are learning the |
| project's style and conventions. |
| |
| On average, each change will have 2 reviewers, a human and an |
| automated test bed system. Usually this would be the project lead, or |
| someone who is familiar with the code being modified. The time |
| required to comment further reduces the time available for writing |
| one's own changes. However, existing Gerrit installations have seen 8 |
| or more reviewers frequently show up on changes that impact many |
| functional areas, and therefore it is reasonable to expect 8 or more |
| reviewers to be able to work together on a single change. |
| |
| Existing installations have successfully processed change reviews with |
| more than 16,000 files per change. However, since 16,000 modified/new |
| files is a massive amount of code to review, it is more typical to see |
| less than 10 files modified in any single change. Changes larger than |
| 10 files are typically merges, for example integrating the latest |
| version of an upstream library, where the reviewer has little to do |
| beyond verifying the project compiles and passes a test suite. |
| |
| CPU Usage - Web UI |
| ~~~~~~~~~~~~~~~~~~ |
| |
| Gerrit's web UI would require on average `4+F+F*C` HTTP requests to |
| review a change and post comments. Here `F` is the number of files |
| modified by the change, and `C` is the number of inline comments left |
| by the reviewer per file. The constant 4 accounts for the request |
| to load the reviewer's dashboard, to load the change detail page, |
| to publish the review comments, and to reload the change detail |
| page after comments are published. |
| |
| This WAG'd estimate boils down to 216,000 HTTP requests per day |
| (QPD). Assuming these are evenly distributed over an 8 hour work day |
| in a single time zone, we are looking at approximately 7.5 queries |
| per second (QPS). |
| |
| ---- |
| QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 + F + F * C) |
| = 2,000 * 2 * 1 * (4 + 10 + 10 * 4) |
| = 216,000 |
| QPS = QPD / 8_Hours / 60_Minutes / 60_Seconds |
| = 7.5 |
| ---- |
| |
| Gerrit serves most requests in under 60 ms when using the loopback |
| interface and a single processor. On a single CPU system there is |
| sufficient capacity for 16 QPS. A dual processor system should be |
| more than sufficient for a site with the estimated load described above. |
| |
| Given a more realistic estimate of 79 changes per day (from the |
| Linux kernel) suggests only 8,532 queries per day, and a much lower |
| 0.29 QPS when spread out over an 8 hour work day. |
| |
| CPU Usage - Git over SSH/HTTP |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| A 24 core server is able to handle ~25 concurrent `git fetch` |
| operations per second. The issue here is each concurrent operation |
| demands one full core, as the computation is almost entirely server |
| side CPU bound. 25 concurrent operations is known to be sufficient to |
| support hundreds of active developers and 50 automated build servers |
| polling for updates and building every change. (This data was derived |
| from an actual installation's performance.) |
| |
| Because of the distributed nature of Git, end-users don't need to |
| contact the central Gerrit Code Review server very often. For `git |
| fetch` traffic, link:pgm-daemon.html[slave mode] is known to be an |
| effective way to offload traffic from the main server, permitting it |
| to scale to a large user base without needing an excessive number of |
| cores in a single system. |
| |
| Clients on very slow network connections (for example home office |
| users on VPN over home DSL) may be network bound rather than server |
| side CPU bound, in which case a core may be effectively shared with |
| another user. Possible core sharing due to network bottlenecks |
| generally holds true for network connections running below 10 MiB/sec. |
| |
| If the server's own network interface is 1 Gib/sec (Gigabit Ethernet), |
| the system can really only serve about 10 concurrent clients at the |
| 10 MiB/sec speed, no matter how many cores it has. |
| |
| Disk Usage |
| ~~~~~~~~~~ |
| |
| The average size of a revision in the Linux kernel once compressed by |
| Git is 2,327 bytes, or roughly 2 KiB. Over the course of a year a |
| Gerrit server running with the estimated maxium parameters above might |
| see an introduction of 1.4 GiB over the total set of 10,000 projects |
| hosted in that server. This figure assumes the majority of the content |
| is human written source code, and not large binary blobs such as disk |
| images or media files. |
| |
| Production Gerrit installations have been tested, and are known to |
| handle Git repositories in the multigigabyte range, storing binary |
| files, ranging in size from a few kilobytes (for example compressed |
| icons) to 800+ megabytes (firmware images, large uncompressed original |
| artwork files). Best practices encourage breaking very large binary |
| files into their Git repositories based on access, to prevent desktop |
| clients from needing to clone unnecessary materials (for example a C |
| developer does not need every 800+ megabyte firmware image created by |
| the product's quality assurance team). |
| |
| Redundancy & Reliability |
| ------------------------ |
| |
| Gerrit largely assumes that the local filesystem where Git repository |
| data is stored is always available. Important data written to disk |
| is also forced to the platter with an `fsync()` once it has been |
| fully written. If the local filesystem fails to respond to reads |
| or becomes corrupt, Gerrit has no provisions to fallback or retry |
| and errors will be returned to clients. |
| |
| Gerrit largely assumes that the metadata PostgreSQL database is |
| online and answering both read and write queries. Query failures |
| immediately result in the operation aborting and errors being |
| returned to the client, with no retry or fallback provisions. |
| |
| Due to the relatively small scale described above, it is very likely |
| that the Git filesystem and PostgreSQL based metadata database |
| are all housed on the same server that is running Gerrit. If any |
| failure arises in one of these components, it is likely to manifest |
| in the others too. It is also likely that the administrator cannot |
| be bothered to deploy a cluster of load-balanced server hardware, |
| as the scale and expected load does not justify the hardware or |
| management costs. |
| |
| Most deployments caring about reliability will setup a warm-spare |
| standby system and use a manual fail-over process to switch from the |
| failed system to the warm-spare. |
| |
| As Git is a distributed version control system, and open source |
| projects tend to have contributors from all over the world, most |
| contributors will be able to tolerate a Gerrit down time of several |
| hours while the administrator is notified, signs on, and brings the |
| warm-spare up. Pending changes are likely to need at least 24 hours |
| of time on the Gerrit site anyway in order to ensure any interested |
| parties around the world have had a chance to comment. This expected |
| lag largely allows for some downtime in a disaster scenario. |
| |
| Backups |
| ~~~~~~~ |
| |
| PostgreSQL can be configured to save its write-ahead-log (WAL) |
| and ship these logs to other systems, where they are applied to |
| a warm-standby backup in real time. Gerrit instances which care |
| about reduduncy will setup this feature of PostgreSQL to ensure |
| the warm-standby is reasonably current should the master go offline. |
| |
| Gerrit can be configured to replicate changes made to the local |
| Git repositories over any standard Git transports. This can be |
| configured in `'$site_path'/etc/replication.conf` to send copies |
| of all changes over SSH to other servers, or to the Amazon S3 blob |
| storage service. |
| |
| |
| Logging Plan |
| ------------ |
| |
| Gerrit does not maintain logs on its own. |
| |
| Published comments contain a publication date, so users can judge |
| when the comment was posted and decide if it was "recent" or not. |
| Only the timestamp is stored in the database, the IP address of |
| the comment author is not stored. |
| |
| Changes uploaded over the SSH daemon from `git push` have the |
| standard Git reflog updated with the date and time that the upload |
| occurred, and the Gerrit account identity of who did the upload. |
| Changes submitted and merged into a branch also update the |
| Git reflog. These logs are available only to the Gerrit site |
| administrator, and they are not replicated through the automatic |
| replication noted earlier. These logs are primarly recorded for an |
| "oh s**t" moment where the administrator has to rewind data. In most |
| installations they are a waste of disk space. Future versions of |
| JGit may allow disabling these logs, and Gerrit may take advantage |
| of that feature to stop writing these logs. |
| |
| A web server positioned in front of Gerrit (such as a reverse proxy) |
| or the hosting servlet container may record access logs, and these |
| logs may be mined for usage information. This is outside of the |
| scope of Gerrit. |
| |
| |
| Testing Plan |
| ------------ |
| |
| Gerrit is currently manually tested through its web UI. |
| |
| JGit has a fairly extensive automated unit test suite. Most new |
| changes to JGit are rejected unless corresponding automated unit |
| tests are included. |
| |
| |
| Caveats |
| ------- |
| |
| Reitveld can't be used as it does not provide the "submit over the |
| web" feature that Gerrit provides for Git. |
| |
| Gitosis can't be used as it does not provide any code review |
| features, but it does provide basic access controls. |
| |
| Email based code review does not scale to a project as large and |
| complex as Android. Most contributors at least need some sort of |
| dashboard to keep track of any pending reviews, and some way to |
| correlate updated revisions back to the comments written on prior |
| revisions of the same logical change. |
| |
| GERRIT |
| ------ |
| Part of link:index.html[Gerrit Code Review] |