@PLUGIN@ health checks

The @PLUGIN@ plugin registers the pull-replication-outstanding-tasks healthcheck. This check will mark a gerrit instance as healthy upon startup only when the node has caught up with all the outstanding pull-replication tasks. The goal is to mark the node as healthy when it is ready to receive write traffic. “Caught up” means:

  • All pending & in-flight replication tasks across all sources (or across a configurable set of repos) have completed
  • There are no queued replication tasks pending and the above condition lasts for at least N seconds (configurable)

See Healthcheck based on replication tasks for more details.

It is worth noting that once the healthcheck eventually succeeds and the instance is marked healthy, the check is then skipped (ie any subsequent invocations will always mark the instance as healthy irrespective of any pending or inflight tasks being present).

Health check configuration

The configuration of the health check is split across two files.

  • The “standard” properties commonly available to all other checks of the healthcheck plugin. These are set in the healthcheck plugin's config file.
  • Settings specific to the check are set in the plugin's config file.

The health check can be configured as follows:

  • healthcheck.@PLUGIN@-outstanding-tasks.projects: The repo(s) that the health check will track outstanding replication tasks against. Multiple entries are supported. If not specified, all the outstanding replication tasks are tracked.
  • healthcheck.@PLUGIN@-outstanding-tasks.periodOfTime: The time for which the check needs to be successful, in order for the instance to be marked healthy. If the time unit is omitted it defaults to milliseconds. Values should use common unit suffixes to express their setting:
  • ms, milliseconds
  • s, sec, second, seconds
  • m, min, minute, minutes
  • h, hr, hour, hours

Default is 10s.

This example config will report the node healthy when there are no pending tasks for the foo and bar/baz repos continuously for a period of 5 seconds after the plugin startup.

[healthcheck "pull-replication-tasks"]
    projects = foo
    projects = bar/baz
    periodOfTime = 5 sec

Useful information

  • The health check is registered only when the healthcheck plugin is installed. If the healthcheck plugin is not installed, then the check registration is skipped during load of the pull-replication plugin.

Note when the @@PLUGIN@@ plugin is installed as a lib (see extension-point.md), then the healthcheck plugin jar should also be present in the lib directory, for the check registration to work.

  • Because the pull-replication healthcheck depends on the healthcheck plugin, renaming/removing the healthcheck jar file is not supported during runtime. Doing so can lead to unpredictable behaviour of your gerrit instance.