blob: 9af175a31c61f8e4a0b0ab129adf804a8ad9e0f0 [file] [log] [blame] [view]
# Roadmap
## General
### Planned features
- **Automated verification process**: Run tests automatically to verify changes. \
\
Most tests in the project require a Kubernetes cluster and some additional
prerequisites, e.g. istio. Currently, the Gerrit OpenSOurce community does not
have these resources. At SAP, we plan to run verification in our internal systems,
which won't be publicly viewable, but could already vote. Builds would only
be triggered, if a maintainer votes `+1` on the `Build-Approved`-label. \
\
Builds can be moved to a public CI at a later point in time.
- **Automated publishing of container images**: Publishing container images will
happen automatically on ref-updated using a CI.
- **Support for multiple Gerrit versions**: All currently supported Gerrit versions
will also be supported in k8s-gerrit. \
\
Currently, container images used by this project are only published for a single
Gerrit version, which is updated on an irregular schedule. Introducing stable
branches for each gerrit version will allow to maintain container images for
multiple Gerrit versions. Gerrit binaries will be updated with each official
release and more frequently on `master`. This will be (at least partially)
automated.
- **Integration test suite**: A test suite that can be used to test a GerritCluster. \
\
A GerritCluster running in a Kubernetes cluster consists of multiple components.
Having a suite of automated tests would greatly help to verify deployments in
development landscapes before going productive.
## Gerrit Operator
### Version 1.0
#### Implemented features
- **High-availability**: Primary Gerrit StatefulSets will have limited support for
horizontal scaling. \
\
Scaling has been enabled using the [high-availability plugin](https://gerrit.googlesource.com/plugins/high-availability/).
Primary Gerrits will run in Active/Active configuration. Currently, two primary
Gerrit instances, i.e. 2 pods in a StatefulSet, are supported
- **Global RefDB support**: Global RefDB is required for Active/Active configurations
of multiple primary Gerrits. \
\
The [Global RefDB](https://gerrit.googlesource.com/modules/global-refdb) support
is required for high-availability as described in the previous point. The
Gerrit Operator automatically sets up Gerrit to use a Global RefDB
implementation. The following implementations are supported:
- [spanner-refdb](https://gerrit.googlesource.com/plugins/spanner-refdb)
- [zookeeper-refdb](https://gerrit.googlesource.com/plugins/zookeeper-refdb)
\
The Gerrit Operator does not set up the database used for the Global RefDB. It
does however manage plugin/module installation and configuration in Gerrit.
- **Full support for Nginx**: The integration of Ingresses managed by the Nginx
ingress controller now supports automated routing. \
\
Instead of requiring users to use different subdomains for the different Gerrit
deployments in the GerritCluster, requests are now automatically routed to the
respective deployments. SSH has still to be set up manually, since this requires
setting up the routing in the Nginx ingress controller itself.
#### Planned features
- **Versioning of CRDs**: Provide migration paths between API changes in CRDs. \
\
At the moment updates to the CRD are done without providing a migration path.
This means a complete reinstallation of CRDS, Operator, CRs and dependent resources
is required. This is not acceptable in a productive environment. Thus,
the operator will always support the last two versions of each CRD, if applicable,
and provide a migration path between those versions.
- **Log collection**: Support addition of sidecar running a log collection agent
to send logs of all components to some logging stack. \
\
Planned supported log collectors:
- [OpenTelemetry agent](https://opentelemetry.io/docs/collector/deployment/agent/)
- Option to add a custom sidecar
- **Support for additional Ingress controllers**: Add support for setting up routing
configurations for additional Ingress controllers \
\
Additional ingress controllers might include:
- [Ambassador](https://www.getambassador.io/products/edge-stack/api-gateway)
### Version 1.x
#### Potential features
- **Support for additional log collection agents**: \
\
Additional log collection agents might include:
- fluentbit
- Option to add a custom sidecar
- **Additional ValidationWebhooks**: Proactively avoid unsupported configurations. \
\
ValidationWebhooks are already used to avoid accepting unsupported configurations,
e.g. deploying more than one primary Gerrit CustomResource per GerritCluster.
So far not all such cases are covered. Thus, the set of validations will be
further expanded.
- **Better test coverage**: More tests are required to find bugs earlier.
- **Automated reload of plugins**: Reload plugins on configuration change. \
\
Configuration changes in plugins typically don't require a restart of Gerrit,
but just to reload the plugin. To avoid unnecessary downtime of pods, the
Gerrit Operator will only reload affected plugins and not restart all pods, if
only the plugin's configuration changed.
- **Externalized (re-)indexing**: Alleviate load caused by online reindexing. \
\
On large Gerrit sites online reindexing due to schema migrations `a)` or initialization `b)`
of a new site might take up to weeks and use a lot of resources, which might
cause performance issues. This is not acceptable in production. The current
plan to solve this issue is to implement a separate Gerrit deployment (GerritIndexer)
that is not exposed to clients and that takes over the task of online reindexing.
The GerritIndexer will mount the same repositories and will share events via
the high-availability plugin. However, it will access repositories in read-only
mode. \
This solves the above named scenarios as follows: \
\
a) **Schema migrations**: If a Gerrit update including a schema migration for
an index is applied, the Gerrit instances serving clients will be configured
to continue to use the old schema. Online reindexing will be disabled in
those instances. The GerritIndexer will have online reindexing enabled and
will start to build the new index version. As soon as it is finished, i.e.
it could start to use the new index version as read index, it will make a
copy of the new index and publish it, e.g. using a shared filesystem. A
restart of the Gerrit instances serving other clients will be triggered.
During this restart the new index will be copied into the site. Since there
may have been updated index entries since the new index version was published
indexing of entries updated in the meantime will be triggered. \
\
b) **Initialization of a new site**: If Gerrit is horizontally scaled, it will
be started with an empty index, i.e. it has to build the complete index. To
avoid this, the GerritIndexer deployment will continuously keep a copy of the
indexes up-to-date. It will regularly be stopped and a copy of the index will
be stored in a shared volume. This can be used as a base for new instances, which
then only have to update index entries that were changed in the meantime.
- **Autoscaling**: Automatically scale Gerrit deployments based on usage. \
\
Metrics like available workers in the thread pools could be used to decide to
scale the Gerrit deployment horizontally. This would allow to dynamically adapt
to the current load. This helps to save costs and resources.
### Version 2.0
#### Potential features
- **Multi region support**: Support setups that are distributed over multiple regions. \
\
Supporting Gerrit installations that are distributed over multiple regions would
allow to serve clients all over the world without large differences in latency
and would also improve availability and reduce the risks of data loss. \
Such a setup could be achieved by using the [multi-site setup](https://gerrit.googlesource.com/plugins/multi-site/).
- **Remove the dependency on shared storage**: Use completely independent sites
instead of sharing a filesystem for some site components. \
\
NFS and other shared filesystems potentially might cause performance issues on
larger Gerrit installations due to latencies. A potential solution might be
to use the [multi-site setup](https://gerrit.googlesource.com/plugins/multi-site/)
to separate the sites of all instances and to use events and replication to
share the state
- **Shared index**: Using an external centralized index, e.g. OpenSearch instead
of x copies of a Lucene index. \
\
Maintaining x copies of an index, where x is the number of Gerrit instances in
a gerritCluster, is unnecessarily expensive, since the same write transactions
have to be potentially done x times. Using a single centralized index would
resolve this issue.
- **Shared cache**: Using an external centralized cache for all Gerrit instances. \
\
Using a single cache for all Gerrit instances will reduce the number of
computations for each Gerrit instance, since not every instance will have to
keep its own copy up-to-date.
- **Sharding**: Shard a site based on repositories. \
\
Repositories served by a single GerritCluster might be quite diverse, e.g. ranging
from a few kilobytes to several gigabytes or repositories seeing high traffic
and other barely being fetched. It is not trivial to configure Gerrit to work
optimally for all repositories. Being able to shard at least the Gerrit Replicas
would help to optimally serve all repositories.
## Helm charts
Only limited support is planned for the `gerrit` and `gerrit-replica` helm-charts
as soon as the Gerrit Operator reaches version 1.0. The reason is that the double
maintenance of all features would not be feasible with the current number of
contributors. The Gerrit Operator will support all features that are provided by
the helm charts. If community members would like to adopt maintainership of the
helm-charts, this would be very much appreciated and the helm-charts could then
continued to be supported.