blob: 3a42e6a071c0322ddcd12514b22e7224d88bfa9e [file] [log] [blame] [view]
@PLUGIN@ health checks
==============
The @PLUGIN@ plugin registers the `pull-replication-outstanding-tasks`
healthcheck. This check will mark a gerrit instance as healthy upon
startup only when the node has caught up with all the outstanding
pull-replication tasks. The goal is to mark the node as healthy when it
is ready to receive write traffic. "Caught up" means:
- All pending & in-flight replication tasks across all sources (or
across a configurable set of repos) have completed
- There are no queued replication tasks pending and the above condition
lasts for at least N seconds (configurable)
See [Healthcheck based on replication tasks](https://issues.gerritcodereview.com/issues/312895374) for more details.
**It is worth noting that once the healthcheck eventually succeeds and
the instance is marked healthy, the check is then skipped (ie any
subsequent invocations will always mark the instance as healthy
irrespective of any pending or inflight tasks being present).**
Health check configuration
--------------------------
The configuration of the health check is split across two files.
- The "standard" properties commonly available to all other checks
of the `healthcheck` plugin. These are set in the `healthcheck` plugin's
[config file](https://gerrit.googlesource.com/plugins/healthcheck/+/refs/heads/master/src/main/resources/Documentation/config.md#settings).
- Settings specific to the check are set in the plugin's [config file](./config.md#file-pluginconfig).
The health check can be configured as follows:
- `healthcheck.@PLUGIN@-outstanding-tasks.projects`: The repo(s) that
the health check will track outstanding replication tasks against.
Multiple entries are supported. If not specified, all the outstanding
replication tasks are tracked.
- `healthcheck.@PLUGIN@-outstanding-tasks.periodOfTime`: The time for
which the check needs to be successful, in order for the instance to be
marked healthy. If the time unit is omitted it defaults to milliseconds.
Values should use common unit suffixes to express their setting:
* ms, milliseconds
* s, sec, second, seconds
* m, min, minute, minutes
* h, hr, hour, hours
Default is 10s.
This example config will report the node healthy when there are no
pending tasks for the `foo` and `bar/baz` repos continuously for a
period of 5 seconds after the plugin startup.
```
[healthcheck "pull-replication-tasks"]
projects = foo
projects = bar/baz
periodOfTime = 5 sec
```
Useful information
------------------
- The health check is registered only when the [healthcheck](https://gerrit.googlesource.com/plugins/healthcheck) plugin
is installed. If the `healthcheck` plugin is not installed, then the
check registration is skipped during load of the pull-replication
plugin.
> **Note** when the @@PLUGIN@@ plugin is installed as a lib (see [extension-point.md](extension-point.md)),
> then the healthcheck plugin jar should also be present in the `lib` directory,
> for the check registration to work.
- Because the pull-replication healthcheck depends on the `healthcheck` plugin, renaming/removing the `healthcheck`
jar file is not supported during runtime. Doing so can lead to unpredictable behaviour of your gerrit instance.