Index account asynchronously and flush upon replication

Account reindexing on remote nodes should not happen immediately
but must queue for asynchronous execution once replication is completed.

Triggering the reindex of the account *before* it arrives on the
All-Users repo would generate two issues:

1. The account would be reindexed with stale data and, when the new
   data will arrive, it won't be reindexed again: we do need to wait
   for the data to arrive through replication.

2. The account index would also populate the cache with stale data
   and even when the new data will arrive the in-memory data would need
   to wait the cache TTL before being visible to the end-user.

NOTE: there is not yet a reliable way to understand *when* the target
      received the new account data. That is the reason why it is
      necessary to wait for *ALL* the nodes to have completed
      replication and then process the account reindexing.

Bug: Issue 12278
Change-Id: Ie442b03f36c1238f6b977e7f5456beff078c9657
5 files changed
tree: a5e851e3b69241094c13ee1287e59c9e595a3269
  1. .settings/
  2. dockerised_local_env/
  3. images/
  4. setup_local_env/
  5. src/
  6. .bazelrc
  7. .gitignore
  8. .mailmap
  9. BUILD
  10. DESIGN.md
  11. docker-compose.kafka-broker.yaml
  12. external_plugin_deps.bzl
  13. LICENSE
  14. README.md
README.md

Gerrit multi-site plugin

This plugin allows to deploy a distributed cluster of multiple Gerrit masters each using a separate site without sharing any storage. The alignment between the masters happens using the replication plugin and an external message broker.

Requirements for the Gerrit masters are:

  • Gerrit v2.16.5 or later
  • Migrated to NoteDb
  • Connected to the same message broker
  • Accessible via a load balancer (e.g. HAProxy)

NOTE: The multi-site plugin will not start if Gerrit is not yet migrated to NoteDb.

Currently, the only mode supported is one primary read/write master and multiple read-only masters but eventually the plan is to support multiple read/write masters. The read/write master is handling any traffic while the read-only masters are serving the Gerrit GUI assets, the HTTP GET REST API and git fetch requests (git-upload-pack). The read-only masters are kept synchronized with the read/write master in order to be always ready to become a read/write master.

For more details on the overall multi-site design and roadmap, please refer to the multi-site plugin DESIGN.md document

License

This plugin is released under the same Apache 2.0 license and copyright holders as of the Gerrit Code Review project.

How to build

The multi-site plugin can only be built in tree mode, by cloning Gerrit and the multi-site plugin code, and checking them out on the desired branch.

Example of cloning Gerrit and multi-site for a stable-2.16 build:

git clone -b stable-2.16 https://gerrit.googlesource.com/gerrit
git clone -b stable-2.16 https://gerrit.googlesource.com/plugins/multi-site

cd gerrit/plugins
ln -s ../../multi-site .
rm external_plugin_deps.bzl
ln -s multi-site/external_plugin_deps.bzl .

Example of building the multi-site plugin:

cd gerrit
bazel build plugins/multi-site

The multi-site.jar plugin is generated to bazel-bin/plugins/multi-site/multi-site.jar.

Example of testing the multi-site plugin:

cd gerrit
bazel test plugins/multi-site:multi_site_tests

NOTE: The multi-site tests include also the use of Docker containers for instantiating and using a Kafka/Zookeeper broker. Make sure you have a Docker daemon running (/var/run/docker.sock accessible) or a DOCKER_HOST pointing to a Docker server.

How to configure

Install the multi-site plugin into the $GERRIT_SITE/lib directory of all the Gerrit servers that are part of the multi-site cluster. Create a symbolic link from $GERRIT_SITE/lib/multi-site.jar into the $GERRIT_SITE/plugins.

Add the multi-site module to $GERRIT_SITE/etc/gerrit.config as follows:

[gerrit]
  installModule = com.googlesource.gerrit.plugins.multisite.Module

Create the $GERRIT_SITE/etc/multi-site.config on all Gerrit servers with the following basic settings:

[kafka]
  bootstrapServers = <kafka-host>:<kafka-port>

[kafka "publisher"]
  enabled = true

[kafka "subscriber"]
  enabled = true

[ref-database]
  enabled = true

[ref-database "zookeeper"]
  connectString = "localhost:2181"

For more details on the configuration settings, please refer to the multi-site configuration documentation.

You also need to setup the Git-level replication between nodes, for more details please refer to the replication plugin documentation.