blob: adf4e2ff6efdb7c26233390dc238e013f5a96937 [file] [log] [blame] [view]
---
title: Summary of the Gerrit User Summit & Hackathon 2019 in Sunnyvale
tags: news summit hackathon
keywords: news summit hackathon
permalink: 2020-04-08-user-summit-sunnyvale-summary.html
summary: "Summary of the Gerrit User Summit & Hackathon 2019 in Sunnyvale"
hide_sidebar: true
hide_navtoggle: true
toc: true
---
## High-performance Summit in numbers
The Gerrit User Summit 2019 has ended, with highest score of achievements
in the history of the 11 years of the entire Gerrit open-source project:
1. Two dates and locations in a 12-months period: Gothenburg (Sweden) and
Sunnyvale (California).
2. Four Gerrit releases delivered: v2.15.16, v2.16.11, v3.0.2, v3.1.0
3. 127 people registered across the two locations,
87 people attended on-site (70% turnout) and 38 people followed the event
remotely at different times using the live streaming coverage
provided by [GerritForge](https://gerritforge.com).
4. 373 changes merged (204 in Gothenburg, 169 in Sunnyvale).
5. 32 developers attended the Hackathons, 8 of them have never contributed or
attended an event before.
6. The highest performing version of Gerrit v3.1.0 released, with over
[2x git and REST-API performance compared to v3.0.x](https://gitenterprise.me/2019/12/20/stress-your-gerrit-with-gatling/).
7. 22 talks presented across Gothenburg and Sunnyvale, with 6 new speakers
that have never presented before at the Summit.
The performance of the Summit is yet again another evidence of the continuous
growth of the community and the increased synergies with the JGit, OpenStack/Zuul
and the Tuleap open-source projects.
## Sunnyvale Hackathon summary
### Gerrit v3.1.0 preparation, load testing and release
During the Hackathon, David Pursehouse has been working on the release of
Gerrit v3.1.0, with the help and support of all the other developers at the
hackathon.
Following the experiences of the previous releases, this year the major focus
has been the stability, end-to-end and load testing of the release. Matthias Sohn (SAP),
Fabio Ponciroli (GerritForge) and Antonio Barone (GerritForge) worked in improving
the Gerrit E2E test suite to perform A/B testing of Gerrit v3.0 vs. v3.1.
GerritForge has upgraded early on the GerritHub.io multi-site setup, keeping one
data-centre (Canada) on v3.0 and upgrading the second data-centre (Germany) to v3.1.
GerritHub.io has thus been the target of the Gerrit v3.1 validation tests which has
been successfully completed and shown a 2x performance improvement ratio between the
two releases.
> NOTE: The E2E tests for Gerrit are based on [Gatling open-source framework](https://gatling.io/)
> with the [Git protocol support](https://github.com/gerritforge/gatling-git) implemented
> by GerritForge.
### Support for large repositories in Gerrit
Luca Milanesio (GerritForge), Matthias Sohn (SAP) and Martin Fick (Qualcomm) have
been discussing the issues associated with very large repositories:
1. JVM heap utilisation and associated GC cycles
Luca contributed the information about the problems and investigations associated with
large *stop-the-world* (STW) GC pauses observed when running Git operations on large
repositories. The JVM heap would need to create a large in-memory packfile and thus
would require the JVM to allocate a very large continuous area of memory. That operation
could, in some cases, trigger a STW GC cycle that could make the Gerrit server unavailable
for a few seconds.
2. Git in-memory cache of Packfiles and BLOBs
Matthias has contributed its experience at SAP in dealing with large repositories. The JVM
heap allocated is huge, up to 500 GBytes. A big part of the heap is dedicated to the in-memory
packfile caching which would avoid the continuous allocation/release of large areas of memory.
However, it looks like that even though the cache is still needed, the JVM at times releases
part of it and may cause the continuous memory allocation/release that may cause STW GC cycles.
3. Quotas support for expensive operations
Martin has proposed a change to the Gerrit quotas to block or delay incoming operations
in the execution queue. It could allow to identify operations that could be potentially
trigger a STW GC and reschedule them at a later time. Whilst this would not completely solve
the problem it would allow the Gerrit instance to have a "breathing space" and recover
heap before serving the exensive operations.
### Review and merge of the ref-table support in JGit and Gerrit
Han-Wen Nienhuys (Google) and Matthias Sohn (SAP) have worked in the final review and submission
of the JGit implementation of ref-table, which was initially designed by Shawn Pearce but never applied
to the OpenSource code-base. Han-Wen has redesigned the feature for making it compatible with the
filesystem-based implementation of JGit.
The ["implement FileReftableDatabase"](https://git.eclipse.org/r/#/c/146568/) change has been merged
into JGit and later [included in Gerrit v3.1.2](https://gerrit-review.googlesource.com/c/gerrit/+/247498).
The [Git reftable](https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md) is
an alternative storage for keeping the list of Git refs on the filesystem. The ones currently implemented
in Git are the loose refs and packed refs, which are both not scalable for repositories with a large number
of refs (e.g. 500k or more).
With regards to the reftable performance, the following table speaks more than a thousand words:
format | cache | scan | by name | by SHA-1
------------|-------|--------------|----------------|------------------
packed-refs | cold | 402 ms | 409,660.1 usec | 412,535.8 usec
packed-refs | hot | | 6,844.6 usec | 20,110.1 usec
reftable | cold | 112.0 ms | 33.9 usec | 323.2 usec
reftable | hot | | 20.2 usec | 320.8 usec
### Merge of the two forks of the high-availability plugin
The [high-availability plugin](https://gerrit.googlesource.com/plugins/high-availability)
has been founded in 2016 by Ericsson with the scope of allowing an active failover of their
Gerrit master setup.
Over the years, the plugin has received many contributions by different companies, including
CollabNet, SAP and GerritForge.
Starting from 2018, GerritForge began to fork the plugin because of the need to have
urgent fixes merged that made their way also in the mainstream repository. However, as we all know,
forking is easy but merging is a lot more complicated and painful and the fork continued for over
Two years with duplication of efforts and imparity of fix levels between the two forks.
Marco Miller (Ericsson), David Ostrovsky and Luca Milanesio (GerritForge) worked hard to merge
the two forks and make them aligned in terms of functionality and fixes. After the hackathon and
in the following few weeks, the GerritForge's fork has been successfully merged into the main
repository.
The only active version of the high-availability plugin is now the mainstream repository.
David Ostrovsky and Luca Milanesio have been officially granted the role of maintainers, together with
the current Ericsson and CollabNet members.
### Multi-site plugin decoupled from Kafka and Zookeeper
The [multi-site plugin](https://gerrit.googlesource.com/plugins/multi-site) was originally released
in April 2019 and is fully based on Kafka/Zookeeper infrastructure for the alignment of indexes, caches
and events across sites.
During the hackathon, Marcin Czech (GerritForge) has worked in abstracting the Kafka/Zookeeper layer
out of the multi-site plugin. That allows Gerrit multi-site to be deployed in the future with a different
infrastructure, possibly more cloud-native and integrated with the major cloud provider services.
The Kafka broker interface has been put into the [kafka events plugin](https://gerrit.googlesource.com/plugins/kafka-events),
which was previously used only for stream events and now also for indexing/cache consistency.
With regards to Zookeeper, an initial request to include a generic support for a global-refdb has
[been presented](https://gerrit-review.googlesource.com/c/homepage/+/237980) but then abandoned because of
the unanimous rejection by the Gerrit community.
Waiting for a different solution to be presented, the support for Zookeeper has been then moved to
a GerritForge-owned [repository on GitHub](https://github.com/GerritForge/plugins_zookeeper).
## Summit summary
The talks have been mainly centred on the new features introduced in Gerrit v3.1:
- The porting to PolyGerrit 2 and the new development team in Germany
- Performance improvements in v3.1
- Support for Git protocol v2
Some of the talks presented in Gothenburg have been replayed in Sunnyvale as well,
with the addition of brand-new talks about the new features and developments completed
in the past three months.
This year the Summit was hosted in the new home of GerritForge in the USA, downtown
Sunnyvale, at The Satellite in the historic Del Monte building.
### What's new in Gerrit v3.0/v3.1
David Ostrovsky, Luca Mianesio (GerritForge) and Patrick Hiesel (Google) have presented
the new features and improvements introduced in Gerrit v3.0/v3.1, two closely
related versions.
Gerrit v3.0/v3.1 include respectively 1,589 and 1,443 commits, which together makes over
3k of changes compared to the latest v2.16.x releases. Gerrit major release number has been
incremented because of breaking changes introduced:
- Removal of the GWT UI
- Removal of ReviewDb (deprecated from v2.16)
- Removal of pushes to refs/drafts/* and refs/changes/*
New and noteworthy feature include:
- Re-introduction of Git protocol v2
- Significant speed-up of the Gerrit frontend and backend, showing up to 2x performance
improvement (Gatling automated tests)
Any upgrade to Gerrit v3.0/v3.1 require to have a stop at v2.16 and convert the changes from
ReviewDb to NoteDb.
### Road-map and migration path to Gerrit v3
Luca Milanesio (GerritForge) presented a deep-dive into the high-level process of
migrating Gerrit from old releases to the latest v3.1.
Migrating is always difficult, and Gerrit migrations before the advent of NoteDb were
alwyas cursed by the schema upgrades needed by ReviewDb. However, migrating to the latest
version is not an option and *must* be planned and executed systematically.
Luca classified Gerrit migrations in four quadrants, based on their version distance
and installation size.
1. Trivial
Small upgrade step (e.g. v2.15 to v2.16) for a small-sized Gerrit setup.
It is typically resolved by a war upgrade and Gerrit restart.
2. Complex
Small upgrade step for a large-scale Gerrit setup.
It typically requires more coordination and communication with the teams about
the planning and execution of the cutover plan. The outage window needs to be
tested and reduced to a minimum.
3. Risky
Big upgrade step (e.g. v2.11 to v3.1) for a small-sized Gerrit setup.
The big gap of releases introduce functional differences and gaps on
the different features (e.g. draft changes migrated to WIP/Private).
4. Ultrahazardous
Big upgrade step (e.g. v2.11 to v3.1) for a large-scale Gerrit setup.
THe big functional gap combined with a large setup involving potentially
hundreds or thousands of people may lead to a very delicate and hazardous
upgrade.
Luca went through the overview of how to plan and execute the migrations of type
1., 2. and 3. while advised to avoid type 4. migrations as they may lead to
expensive and un-necessary risks.
Any upgrade of type 4. can be translated as a series of upgrades of type 2. which
would lower the risk and increase the confidence and understanding of the new
Gerrit features.
Gareth Bowles (Apple) presented his experience on managing Gerrit and automating
each phase of its lifecycle using Ansible. Apple's installation has 1k projects with
over 670k patch-sets and used by over 800+ worldwide.
Cesare San Martino (GerritForge) explained how the adoption of Gerrit high-availability
plugin and architecture can help in lowering the risks associated with migrations and
reduce the outage window to a minimum if not even to zero in certain cases.
### Gerrit Q&A with the maintainers
For the very first time the Q&A was a global event, allowing people on-site in Sunnyvale
and remote around the globe in streaming to interact and ask questions directly
to the Gerrit maintainers.
The questions were at 360 degrees covering multiple topics:
- Status of the Gerrit plugins
- Onboarding of new contributors to the Gerrit project
- New organisation of the Gerrit Open-Source community with ESC and CMs
- Gerrit vs. GitHub vs. GitLab: competition or integration
- Pull-request workflow for Gerrit
### What's cooking in JGit
Ivan Frade and Han-Wen Nienhuys (Google) have presented the new innovative
features that are coming in the next forthcoming versions of JGit.
This is the first time since the last [GitTogether in 2011](https://opensource.googleblog.com/2011/12/gittogether-2011.html)
that core Git contributors are participating with a mixed Git/Gerrit audience.
Ivan presented what's new on the JGit server side, which is the backend
that serves the Git protocol for the Chromium and Android Open-Source projects.
The new features introduced in JGit from v5.2/3/4/5 and master are focussed on:
- Exposing server options mechanism, made possible since the introduction of
Git protocol v2. That allowed to enable precious features like the Git-protocol
level tracing from a server-side perspective.
- Consistency on demand and update indexes, which allows Git servers on multiple
sites to pass a consistency version token and detect when commits are replicated
to remote servers and thus ready to be fetched.
- Reachability checker optimisation, which allows large repositories to reduce
the execution time and CPU utilisation of the validation of the "WANT SHA1"
commands received from the Git client.
- Sideband-all, which means that at any point in the communication the client
and server can pass parallel information via the normal Git protocol client/server
communication. That allows new use-cases like the packfile off-loading, which
is a new capability that would communicate to a series of mirrors where the packfiles
can be fetched concurrently.
- Local reftables, presented by Han-Wen, are an innovative storage format that
allows repositories to scale to millions of refs without impacting significantly
the access time on the filesystem and reducing lock contention in case of concurrent updates.
### New developments and team structure in the PolyGerrit Team
Google's Gerrit frontend team has been successfully re-staffed with four new
hires over the summer. Today it consists of Ben, Dhruv, Dmitrii, Milutin, Ole,
Tao - all working from the Munich office alongside Google's backend team.
For the 3.1 release the frontend infrastructure has been changed to use Polymer
2 instead of 1, which among other things means that all UI components are
encapsulated using the Shadow DOM. The team's focus are further infrastructure
projects (Polymer 3, stronger typing, npm, content-security-policy, ...),
performance, checks and a new feature for tracking whose turn it is for all your
code reviews.
### Status of the Gerrit Code-Review Analytics for the Android open-source project
David Ostrovsky and Luca Milanesio (GerritForge) presented the work done
to extend the Gerrit DevOps Analytics Open-Source platform (GDA) to cover also
the use-case of the Android Open-Source Project (AOSP).
The GDA platform collects information about Git commits, reviews and logs and correlate
them together to build dashboards of KPIs that are relevant to the people involved
with the project.
Luca described the challenges of applying the platform to AOSP:
- Mirroring of AOSP repositories to GerritHub.io in order to minimise network
traffic during the Big-Data processing.
- Scale-up the current performance of the analytics extractors and ELT so that
AOSP branches are resolved quickly and without impact on the JVM utilisation.
- David has presented the challenge of parsing foreign NoteDb change-data using
the Gerrit internal API.
### What's new in the Bazel tool-chain for Gerrit
David Ostrovsky (GerritForge) presented the advance in the Bazel latest versions
adoption in the Gerrit build tool-chain and its plugins.
The Gerrit build process is complex, and involves Java, JavaScript, 160+ dependencies,
150+ plugins built in two modes (standalone and in-tree). All of that needs to be
orchestrated, automated and executed in a fast, correct and reproducible way.
Gerrit started as a Maven build project (until v2.7) and then later moved to Buck
(v2.8-v2.13) and eventually adopted Bazel (v2.14 onwards). Bazel is the industry standard
for large, distributed and fast builds executed in the build server. It is used
by large companies around the globe, including Spotify, Uber, Stripe, nVidia, Volvo
and many others.
David explained how the overall build process works in Gerrit and highlighted the
versions where the build is actively supported by the community (v2.16 onwards). Bazel builds
are orchestrated by the [Gerrit CI](https://gerrit-ci.gerritforge.com), initially created
by GerritForge and now actively supported by the whole Gerrit community.
Last but not least, David explained some of the tips and tricks on how to perform
integration-tests in Gerrit, using the TestContainers library, which allows to automatically
test and validate more complex scenarios like ElasticSearch indexes and the Gerrit
multi-site plugin.
### Racy JGit
Matthias Sohn (SAP) presented the history of how time is used in JGit and the struggle
to improve the resiliency to the [git racy-reads problem](https://git-scm.com/docs/racy-git/en).
It all started with the [bug #544199](https://bugs.eclipse.org/bugs/show_bug.cgi?id=544199)
reported by Luca Milanesio (GerritForge) which was later fixed by adjusting the way packfiles cache
[consistency is checked](https://git.eclipse.org/r/#/c/138521/)
against the filesystem.
The fix opened up the pandora box of the 2.5s hard-coded resolution in JGit for dealing
with racy-reads. The additional checks to make sure that a packfile has not been changed
after being cached in memory, raised the
[bug #546891](https://bugs.eclipse.org/bugs/show_bug.cgi?id=546891) related to the performance
regression observed.
The reason why JGit historically used a hard-coded resolution of 2.5s was the FAT filesystem
storing timestamps with 2s resolution (the extra .5s is a safety margin), which can still be
found in some Windows systems running Eclipse.
Matthias and the other folks at SAP have been working hard to improve the way the filesystem
resolution is detected and make all of that available in JGit transparently, without having
to configure anything special in Gerrit or JGit.
The problem was not easy to resolve as the complex combination of JVM versions, OSes and filesystems
created a series of conditions that could have made the calculation of the resolution a lot harder
than initially thought.
The challenge was eventually completed after seven months of work by six different authors
(Chris, Han-Wen, Luca, Matthias, Marc, Thomas) and 82 commits across 22 different service releases. JGit
with the racy-reads problem resolved and optimised is now included in all the latest active versions
of Gerrit.
### OSSUM with Gerrit
Miikka Andersson from CollabNet gave a presentation about the company’s latest product
initiative: ossum. Ossum is a developer-focused SaaS solution for software engineering
needs with a strong focus on planning, version control, and Continuous Integration.
Gerrit is one of ossum’s key components on top of which the entire version control service
was built. The decision of choosing Gerrit for the backend wasn’t coincidence: CollabNet
has a long history with Gerrit and ossum is already company’s second product providing
Gerrit-powered Git service.
In his [presentation](https://storage.cloud.google.com/gerrit-talks/summit/2019/ossum-GUS_2019.pdf),
Miikka went through some of the key factors contributing to the decision to choose
Gerrit to be used as Git backend for the new product initiative. In addition to that, some
of the key takeaways and lessons learnt from earlier Gerrit-based product initiatives
were shared with the audience.
### Revert submission
Gal Paikin (paiking@) from Google showed a new feature he was working on.
In this presentation [presentation](https://docs.google.com/presentation/d/e/2PACX-1vTkbE5AIWEFcEyUnQ6ZlfglClgsX9h5fjB6dkSsCvXuL75Jd0DdsZfarvKswYtyCKUN0_QJQDdJ8Qzw/pub?start=false&loop=false&delayms=10000&slide=id.g6c93d79dc5_0_29)
he described RevertSubmission, a new endpoint that allows reverting multiple changes simultaneously.
This endpoint is meant to ease the workflow of many engineers that submit many changes together.
### Gerrit metrics and dashboards
We all know metrics are important to monitor the status of our systems and avoid
our users to tell us Gerrit is not working before we realise it.
Gerrit logs are an under-evaluated gold mine of metrics.
In this [presentation](https://docs.google.com/presentation/d/1EeJdCngQaVBxJPQaGC2DYtTS9gzM9II-uoi7DtPQkw0/edit?usp=sharing)
Fabio Ponciroli (aka Ponch, GerritForge) showed its five favourites metrics which
help in the daily job of a Gerrit admin.
### Stress your Gerrit with Gatling
Fabio Ponciroli (GerritForge), aka Ponch, showed the work on implementing a consistent
end-to-end test scenario for Gerrit by leveraging the Gatling tool.
Testing Gerrit involves the invocation of REST-API by simulating the PolyGerrit UI and
also the use of Git/HTTP and Git/SSH protocol. Gatling, however, does not support
the Git protocol out-of-the-box. Ponch has introduced the gatling-git project,
that extends Gatling to include the Git protocol.
The definition of end-to-end tests is further simplified by using the Gatling “feeders”.
Those are sample data in JSON format, which can also be generated from existing
Gerrit production logs.
Ponch has then showcased, to Luca’s surprise, a real use-case of running load tests
against GerritHub.io, and they generated the expected spike of incoming traffic.
This is a [post](https://gitenterprise.me/2019/12/20/stress-your-gerrit-with-gatling/),
with the video of the presentation, about the topic published on the GerritForge blog.
## Feedback and proposals of improvements for the next Summits
For the very first time, the Q&A with the maintainers was done with a vast audience,
including the people from Europe in Gothenburg, the users in Silicon Valley and the
remote attendees remotely using the [GerritForge Live streaming](https://live.gerritforge.com).
The audience was very active and asked many questions related to the Gerrit release
management (can Gerrit have more stable and well-defined release plan?), to the plugins
lifecycle management and to the new processes introduced in the community like the
design-driven contribution.
Also the recurring question about the "competition" between Gerrit, GitLab and GitHub came
back, with different feedback from various users and companies. There is also people still
happily using IBM ClearCase ! That means there isn't a golden standard for using a golden
platform that would resolve all the use-cases.
The main reason people and companies have adopted Gerrit is the need for scalability and
managing a large number of users across different sites across the globe.
Another thing that emerged again is how to smooth the learning curve for the new adopters
of Gerrit Code Review, possibly giving the possibility to contribute using a branch or pull-request
review model, in conjunction with the typical change-based code review.
All the discussions and hints were captured in the
[Gerrit Code Review Issue Tracker](https://bugs.chromium.org/p/gerrit/issues/list?can=2&q=label%3ARetrospective)
associated with the label `retrospective` for easier discovery and tracking.
----
Thank you again to all the attendees of the Gerrit User Summit 2019 in Volvo - Sweden
and GerritForge Inc - California. Looking forward to another exciting year of innovation
and development of the Gerrit Code Review platform and community.
Luca Milanesio (Gerrit Maintainer, Release Manager, ESC member) with contributions and
reviews by David Pursehouse (CollabNet), Fabio Ponciroli (GerritForge), Matthias Sohn (SAP),
David Ostrovsky, Gal Paikin (Google), Douglas Luedtke (Garmin), Nasser Grainawi (Qualcomm).