Rework external ID cache loader

The current system uses an ExternalId cache that is serialized and
written to disk (if configured). The cache holds an entire generation
of all external IDs, keyed by the SHA1 of `refs/meta/external-ids`.
This is roughly `Cache<ObjectId, List<ExternalID>>`.

Prior to this commit, on servers where an update does not originate
(on other masters or slaves), the cache loader would re-read all
external IDs from Git when it was called. On googlesource.com, these
regenerations can take up to 60 seconds. Within this time, the Gerrit
instance comes to a grinding halt as lots of code paths depend on a
value of this cache being present. Authentication is one of them.

This commit rewrites the loader and implements a differential
computation approach to compute the new state from a previously cached
state by applying the modifications using a Git diff.

Given the SHA1 (tip in refs/meta/external-ids) that is requested, the
logic first tries to find a state that we have cached by walking Git
history. This is best-effort and we allow at most 10 commits to be
walked.

Once a prior state is found, we use that state's SHA1 to do a tree diff
between that and the requested state. The new state is then generated by
applying the same mutations.
JGit's DiffFormatter is smart in that it only traverses trees that have
changed and doesn't load any file content which ensures that we only
perform the minimal number of Git operations necessary. This is
necessary because NotesMap (the storage format of external ids on disk)
shards pretty aggressively and we don't want to load all trees when
applying only deltas.

Once the (tree) diff is computed, we read the newly added external IDs
using an ObjectReader.

There is currently a broader discussion going on about if the primary
storage format of external IDs should be changed (I87119506ec04).
This commit doesn't answer or interfere with that discussion. However,
if that redesign is required will - apart from other things - depend on
the impact of this commit on the problems that I87119506ec04 outlines.

We hope that this commit already mitigates a large chunk of the slow
loading issues. We will use micro benchmarking and look closer at how
the collections are handled if there is a need after this commit.

Change-Id: I0e67d3538e2ad17812598a1523e78fd71a7bd88a
14 files changed
tree: 3394c5a6a0e7b17e1196ba3229fea7592fcc6dbe
  1. .settings/
  2. antlr3/
  3. contrib/
  4. Documentation/
  5. java/
  6. javatests/
  7. lib/
  8. plugins/
  9. polygerrit-ui/
  10. prolog/
  11. prologtests/
  12. proto/
  13. resources/
  14. tools/
  15. webapp/
  16. .bazelproject
  17. .bazelrc
  18. .bazelversion
  19. .editorconfig
  20. .git-blame-ignore-revs
  21. .gitignore
  22. .gitmodules
  23. .mailmap
  24. .pydevproject
  25. BUILD
  26. COPYING
  27. INSTALL
  28. package.json
  29. README.md
  30. SUBMITTING_PATCHES
  31. version.bzl
  32. WORKSPACE
README.md

Gerrit Code Review

Gerrit is a code review and project management tool for Git based projects.

Build Status

Objective

Gerrit makes reviews easier by showing changes in a side-by-side display, and allowing inline comments to be added by any reviewer.

Gerrit simplifies Git based project maintainership by permitting any authorized user to submit changes to the master Git repository, rather than requiring all approved changes to be merged in by hand by the project maintainer.

Documentation

For information about how to install and use Gerrit, refer to the documentation.

Source

Our canonical Git repository is located on googlesource.com. There is a mirror of the repository on Github.

Reporting bugs

Please report bugs on the issue tracker.

Contribute

Gerrit is the work of hundreds of contributors. We appreciate your help!

Please read the contribution guidelines.

Note that we do not accept Pull Requests via the Github mirror.

Getting in contact

The Developer Mailing list is repo-discuss on Google Groups.

License

Gerrit is provided under the Apache License 2.0.

Build

Install Bazel and run the following:

    git clone --recurse-submodules https://gerrit.googlesource.com/gerrit
    cd gerrit && bazel build release

Install binary packages (Deb/Rpm)

The instruction how to configure GerritForge/BinTray repositories is here

On Debian/Ubuntu run:

    apt-get update & apt-get install gerrit=<version>-<release>

NOTE: release is a counter that starts with 1 and indicates the number of packages that have been released with the same version of the software.

On CentOS/RedHat run:

    yum clean all && yum install gerrit-<version>[-<release>]

On Fedora run:

    dnf clean all && dnf install gerrit-<version>[-<release>]

Use pre-built Gerrit images on Docker

Docker images of Gerrit are available on DockerHub

To run a CentOS 7 based Gerrit image:

    docker run -p 8080:8080 gerritforge/gerrit-centos7[:version]

To run a Ubuntu 15.04 based Gerrit image:

    docker run -p 8080:8080 gerritforge/gerrit-ubuntu15.04[:version]

NOTE: release is optional. Last released package of the version is installed if the release number is omitted.