blob: 77078a418b9549d26ab7d050e6e344d3bec06c08 [file] [log] [blame] [view]
# Tuning
Tuning chronicle-map correctly might be a daunting task:
How many entries does a particular cache instance need?
what is the average key and value for it?
Rather than leaving you only with the trial and error (or the guesswork)
approach, this module provides utilities to help you get started in the right
direction.
If you have not migrated to chronicle-map yet, then follow instructions on how
to analyze your existing H2 caches [here](#analyze-h2-caches).
In case you have already migrated to chronicle-map please follow instructions on
how to further tune existing .dat caches [here](#auto-adjust-chronicle-map-caches).
## Analyze H2 caches
Since chronicle-map is one of the first open-source alternatives to the H2
implementation, it is very likely that your Gerrit instance has been running
with the default H2 cache backend.
The idea is to read from the _actual_ H2 persisted files and output the
information that will be required to configure chronicle-map as an alternative.
The Gerrit/SSH command to analyze H2 caches requires the user to have
`Administrate Caches` or `Administrate Server` capabilities.
You can do this _before_ installing cache-chroniclemap as a lib module so that
your Gerrit server will not need downtime. As follows:
* Drop `cache-chroniclemap.jar` file in the `plugins/` directory.
* Wait for the pluginLoader to acknowledge and load the new plugin. You will
see an entry in the `error_log`:
```
INFO com.google.gerrit.server.plugins.PluginLoader : Loaded plugin cache-chroniclemap
```
* You can now run an analysis on the current status of your H2 caches
```bash
ssh -p 29418 admin@<gerrit-server> cache-chroniclemap analyze-h2-caches
```
The result will be outputted on standard output in a git config format.
This is an example (the values are made up):
```
****************************
** Chronicle-map template **
****************************
[cache "diff_summary"]
maxEntries = 101
avgKeySize = 192
avgValueSize = 1350
[cache "web_sessions"]
maxEntries = 1
avgKeySize = 68
avgValueSize = 332
[cache "pure_revert"]
maxEntries = 1
avgKeySize = 112
avgValueSize = 8
[cache "mergeability"]
maxEntries = 101
avgKeySize = 150
avgValueSize = 8
[cache "diff"]
maxEntries = 101
avgKeySize = 188
avgValueSize = 5035
[cache "persisted_projects"]
maxEntries = 2
avgKeySize = 88
avgValueSize = 4489
[cache "accounts"]
maxEntries = 5
avgKeySize = 52
avgValueSize = 505
```
Empty caches (if any) will not generate empty config stanzas, rather a warning
will be displayed on standard output.
For example:
```
WARN: Cache diff_intraline is empty, skipping
```
Please note that the generated configuration is not necessarily final and it
might still need adjustments:
* Since chronicle-map file size is pre-allocated, you might want to allow for
more entries.
* You might want account for uncertainty by specifying a `maxBloatFactor` greater
than 1.
* any other reason.
Once you gathered the information you wanted you might consider to remove the
plugin:
* Remove the jar from the `plugins` directory
```bash
rm plugins/cache-chroniclemap.jar
```
* Wait for the pluginLoader to acknowledge and unload the plugin. You will
see an entry in the `error_log`:
```
INFO com.google.gerrit.server.plugins.PluginLoader : Unloading plugin cache-chroniclemap
```
## Auto-adjust Chronicle-map caches
If you have already migrated to chronicle-map then already have `.dat` caches
available under the `cache` directory, and you have provided suitable
configuration for the existing caches as explained in the [configuration](./config.md)
documentation.
However, situations might arise for which new caches will be created for which
no configuration has yet been provided: new persistent caches might be
introduced on new versions of Gerrit, or you might end-up using a plugin that
makes use of an additional cache, for example.
When this happens, you might have little or no idea of what values should be
provided for those caches, such as average key size and average value size, and
you have to rely on default values.
This plugin provides an SSH command and a REST-API that will help you analyze
the current, suboptimal, chronicle-map caches and migrate into new ones for
which a more realistic configuration is generated based on data.
The tuning of the caches requires the user to have
`Administrate Caches` or `Administrate Server` capabilities.
* Symlink the `cache-chroniclemap.jar` file in the `plugins/` directory (from
the `lib/` directory).
* Wait for the pluginLoader to acknowledge and load the new plugin. You will see
an entry in the `error_log`:
```
INFO com.google.gerrit.server.plugins.PluginLoader : Loaded plugin cache-chroniclemap
```
* You can now run an the tuning command:
```bash
ssh -p 29418 admin@<gerrit-server> cache-chroniclemap auto-adjust-caches [--dry-run] [cache-name]
```
* You can also use the REST-API:
```
PUT /plugins/cache-chroniclemap/auto-adjust-caches
```
* `--dry-run` or `-d` (SSH), `?dry-run` or `?d` (REST-API) optional parameter
Calculate the average key and value size, but do not migrate current cache
data into new files
* `--max-entries` or `-m` (SSH), `?max-entries` or `?m` (REST-API) optional parameter
The number of entries the tuned cache file is going to hold. This is typically
useful when the auto-tuning is executed with the intent to increase the number
of entries that the current cache can hold. When not specified, the
auto-adjust-cache command checks the percentage utilization of the current
cache.
If the current utilization of the cache is higher than 50%, then `maxEntries`
for the tuned cache will be increased by a factor of *2*.
To _decrease_ the number of max entries during auto-tuning, the `max-entries`
value should be passed _explicitly_.
Note that this parameter will be used globally across all caches, so if you want
to increase the size of a particular cache only you should be using this
together with the `cache-name` parameter.
* `cache-name` (SSH), `?CACHE_NAME=cache-name` (REST-API) optional restriction of the caches
to analyze and auto-tune. The parameter can be repeated multiple times for analyzing
multiple caches. By default, analyze and adjust all persistent caches.
For each chronicle-map cache that needs tuning (i.e. `foo_1.dat` file) in
the `cache` directory, a new one will be created (i.e. `foo_1_tuned_<timestamp>.dat`).
The new cache will have these characteristics:
- Will have the same entries as the original cache.
- Will be configured with the *actual* average key size and values calculated by
looking at the content of the original cache.
An output will also be generated with the configuration changes that should
be included into `gerrit.config`, should you decide to use the new caches.
An example of the output is the following:
```bash
ssh -p 29418 admin@localhost cache-chroniclemap auto-adjust-caches
[mergeability] calculate average key/value size: 100% (849601/849601)
[diff_summary] calculate average key/value size: 100% (410894/410894)
[diff_intraline] calculate average key/value size: 100% (101868/101868)
[web_sessions] calculate average key/value size: 100% (1/1)
[conflicts] calculate average key/value size: 100% (364722/364722)
[diff] calculate average key/value size: 100% (72613/72613)
[accounts] calculate average key/value size: 100% (22614/22614)
[change_kind] calculate average key/value size: 100% (838009/838009)
[persisted_projects] calculate average key/value size: 100% (47385/47385)
[persisted_projects] migrate content: 100% (47385/47385)
****************************
** Chronicle-map template **
****************************
__CONFIG__
[cache "mergeability"]
avgKeySize = 76
avgValueSize = 5
maxEntries = 3398404
maxBloatFactor = 4
[cache "diff_summary"]
avgKeySize = 96
avgValueSize = 241
maxEntries = 1643576
maxBloatFactor = 4
[cache "diff_intraline"]
avgKeySize = 503
avgValueSize = 370
maxEntries = 407472
maxBloatFactor = 4
[cache "web_sessions"]
avgKeySize = 41
avgValueSize = 166
maxEntries = 94852
maxBloatFactor = 4
[cache "conflicts"]
avgKeySize = 61
avgValueSize = 5
maxEntries = 1458888
maxBloatFactor = 4
[cache "diff"]
avgKeySize = 94
avgValueSize = 571
maxEntries = 290452
maxBloatFactor = 4
[cache "accounts"]
avgKeySize = 26
avgValueSize = 90
maxEntries = 90456
maxBloatFactor = 4
[cache "change_kind"]
avgKeySize = 55
avgValueSize = 6
maxEntries = 3352036
maxBloatFactor = 4
[cache "persisted_projects"]
avgKeySize = 49
avgValueSize = 1770
maxEntries = 189536
maxBloatFactor = 4
```
The operation might take from seconds to minutes, depending on the size of the
caches and it could be performed periodically to assess how the cache data
evolves in respect to their current configuration.
Running the command against gerrithub data for an overall number of entries
of circa 3M, took ~2 mins (on a 2.6 GHz 6-Core Intel Core i7 with 16Gb or RAM).
Depending on the results you might find that the newly generated caches have
average key/value configurations that are substantially different from the
current ones. This might be just a by-product of how the Gerrit instance is
used, and of the different data that it generates (think about how the average
size of your diffs might change over time, for example).
You should consider replacing only those caches that have drifted away
considerably from the actual profile of the data they store (i.e. the values
currently in `gerrit.config` are substantially different from the output of
the `auto-adjust-caches` command).
Using the new caches requires things:
* Update the `gerrit.config` with the output produced by the command
* replace the existing caches with the new caches.
* restart gerrit
*Note*:
The `auto-adjust-caches` can be run online without any disruption of the Gerrit
server. However, note that since the migration perform many, sequential reads
from the cache, it will participate in the overall load of the system, so
you should test accordingly.
In an HA environment the tuning of the cache can be done on a single node and
then the caches can be copied over to other nodes.
For example, in a two nodes installation (gerrit-1 and gerrit-2):
- Run the `tune-chroniclemap-caches` on gerrit-2
- copy the `tuned` cache files to gerrit-1
For each cache `foo` you want to install/replace do:
1. Stop `gerrit-2`
2. replace the existing caches with the `tuned` ones.
```bash
mv foo_1_tuned_<timestamp>.dat foo_1.dat
```
3. replace/add the `[cache "foo"]` stanza in the `gerrit.config`
```
[cache "persisted_projects"]
avgKeySize = 49
avgValueSize = 1770
maxEntries = 189536
maxBloatFactor = 4
```
4. restart gerrit-2
Once you have tested gerrit-2 and you are happy with the results you can perform
steps *1.* to *4.* for `gerrit-1`.