Update prometheus chart to 15.10.1 (Prometheus 2.34.0)

Required to run on Kubernetes 1.22+.

Change-Id: I87f808c1b6b34c844fdc257fdbdf87813c543315
2 files changed
tree: 9fe2cefddf14393de7d7e6525e81cede91e75f06
  1. .github/
  2. cfgmgr/
  3. charts/
  4. dashboards/
  5. documentation/
  6. promtail/
  7. subcommands/
  8. .gitignore
  9. .pylintrc
  10. config.yaml
  11. gerrit_monitoring.py
  12. jsonnetfile.json
  13. jsonnetfile.lock.json
  14. LICENSE
  15. Pipfile
  16. Pipfile.lock
  17. README.md
README.md

Monitoring setup for Gerrit

This project provides a setup for monitoring Gerrit instances. The setup is based on Prometheus and Grafana running in Kubernetes. In addition, logging will be provided by Grafana Loki.

The setup is provided as a helm chart. It can be installed using Helm (This README expects Helm version 3.0 or higher).

The charts used in this setup are the chart provided in the open source and can be found on GitHub:

This project just provides values.yaml-files that are already configured to work with the metrics-reporter-prometheus-plugin of Gerrit to make the setup easier.

Dependencies

Software

  • Gerrit
    Gerrit requires the following plugin to be installed:

  • Promtail
    Promtail has to be installed with access to the logs-directory in the Gerrit- site. A configuration-file for Promtail will be provided in this setup. Find the documentation for Promtail here

  • Helm
    To install and configure Helm, follow the official guide.

  • ytt
    ytt is a templating tool for yaml-files. It is required for some last moment configuration. Installation instructions can be found here.

  • Pipenv
    Pipenv sets up a virtual python environment and installs required python packages based on a lock-file, ensuring a deterministic Python environment. Instruction on how Pipenv can be installed, can be found here

  • Jsonnet
    Jsonnet is used to create the JSON-files describing the Grafana dashboards. Instruction on how Jsonnet can be installed, can be found here

  • Grafonnet
    Grafonnet should be installed using jsonnet-bundler and the jsonnetfile.json provided by this project. Install jsonnet-bundler as described here. Then run jb install from this project's root directory.

Infrastructure

  • Kubernetes Cluster
    A cluster with at least 3 free CPUs and 4 GB of free memory are required. In addition persistent storage of about 30 GB will be used.

  • Ingress Controller
    The charts currently expect a Nginx ingress controller to be installed in the cluster.

  • Object store
    Loki will store the data chunks in an object store. This store has to be callable via the S3 API.

Add dashboards

There are two ways to have dashboards deployed automatically during installation:

Using JSON

One way is to export the dashboards to a JSON-file in the UI or create JSON-files describing the dashboards in another way. Put these dashboards into the ./dashboards-directory of this repository.

Using Jsonnet + Grafonnet

The other way is to use Jsonnet/Grafonnet to programmatically create dashboards. Install Grafonnet into the project as described above and put your dashboard jsonnet files into the dashboards-directory or one of its subdirectories. The jsonnet-based dashboards can be transcribed into json manually using the following command:

jsonnet -J grafonnet-lib --ext-code publish=false dashboards/<dashboard>.jsonnet

The external variable publish should be set to false, if the dashboard is imported via API and to true, if it is published to the Grafana homepage or imported via the UI.

Configuration

While this project is supposed to provide a specialized and opinionated monitoring setup, some configuration is highly dependent on the specific installation. These options have to be configured in the ./config.yaml before installing and are listed here:

optiondescription
gerritServersList of Gerrit servers to scrape. For details refer to section below
namespaceThe namespace the charts are installed to
tls.skipVerifyWhether to skip TLS certificate verification
tls.caCertCA certificate used for TLS certificate verification
istio.enabledWhether to use istio
istio.crtTLS cert for Ingress gateway (should have alternative names for URLs of all components)
istio.keyTLS key for Ingress gateway
istio.jwt.certRSA certificate to be used to create JWT tokens
istio.jwt.keyRSA key to be used to create JWT tokens
istio.jwt.issuerIssuer to be used for tokens (e.g. an email address)
monitoring.prometheus.server.hostPrometheus server ingress hostname
monitoring.prometheus.server.usernameUsername for Prometheus (only required if not using istio)
monitoring.prometheus.server.passwordPassword for Prometheus (only required if not using istio)
monitoring.prometheus.server.tls.certTLS certificate
monitoring.prometheus.server.tls.keyTLS key
monitoring.prometheus.alertmanager.slack.apiUrlAPI URL of the Slack Webhook
monitoring.prometheus.alertmanager.slack.channelChannel to which the alerts should be posted
monitoring.grafana.hostGrafana ingress hostname
monitoring.grafana.tls.certTLS certificate (only required if not using istio)
monitoring.grafana.tls.keyTLS key (only required if not using istio)
monitoring.grafana.admin.usernameUsername for the admin user
monitoring.grafana.admin.passwordPassword for the admin user
monitoring.grafana.ldap.enabledWhether to enable LDAP
monitoring.grafana.ldap.hostHostname of LDAP server
monitoring.grafana.ldap.portPort of LDAP server (Has to be quoted!)
monitoring.grafana.ldap.passwordPassword of LDAP server
monitoring.grafana.ldap.bind_dnBind DN (username) of the LDAP server
monitoring.grafana.ldap.accountBasesList of base DNs to discover accounts (Has to have the format "['a', 'b']")
monitoring.grafana.ldap.groupBasesList of base DNs to discover groups (Has to have the format "['a', 'b']")
monitoring.grafana.dashboards.editableWhether dashboards can be edited manually in the UI
logging.loki.hostLoki ingress hostname
logging.loki.usernameUsername for Loki (only required if not using istio)
logging.loki.passwordPassword for Loki (only required if not using istio)
logging.loki.s3.protocolProtocol used for communicating with S3
logging.loki.s3.hostHostname of the S3 object store
logging.loki.s3.accessTokenThe EC2 accessToken used for authentication with S3
logging.loki.s3.secretThe secret associated with the accessToken
logging.loki.s3.bucketThe name of the S3 bucket
logging.loki.s3.regionThe region in which the S3 bucket is hosted
logging.loki.tls.certTLS certificate (only required if not using istio)
logging.loki.tls.keyTLS key (only required if not using istio)

gerritServers

Two types of Gerrit servers are currently supported, which require different configuration parameters:

  • Kubernetes
    Gerrit installations running in the same Kubernetes cluster as the monitoring setup. Multiple replicas are supported and automatically discovered.
optiondescription
gerritServers.kubernetes.[*].namespaceNamespace into which Gerrit was deployed
gerritServers.kubernetes.[*].label.nameLabel name used to select deployments
gerritServers.kubernetes.[*].label.valueLabel value to select deployments
gerritServers.kubernetes.[*].containerNameName of container in the pod that runs Gerrit
gerritServers.kubernetes.[*].portContainer port to be used when scraping
gerritServers.kubernetes.[*].usernameUsername of Gerrit user with ‘View Metrics’ capabilities
gerritServers.kubernetes.[*].passwordPassword of Gerrit user with ‘View Metrics’ capabilities
  • Federated Prometheus
    Load balanced Gerrit instances can't be scraped through the load balancer. For this use cases typically a local Prometheus is installed and then scraped by the central Prometheus in a federated setup.
optiondescription
gerritServers.federatedPrometheus.[*].hostHost running Gerrit and the Prometheus instance being scraped
gerritServers.federatedPrometheus.[*].portPort used by Prometheus
gerritServers.federatedPrometheus.[*].usernameUsername for authenticating with Prometheus
gerritServers.federatedPrometheus.[*].passwordPassword for authenticating with Prometheus
  • Other
    Gerrit installations with just one replica that can run anywhere, where they are reachable via HTTP.
optiondescription
gerritServers.other.[*].hostHostname (incl. port, if required) of the Gerrit server to monitor
gerritServers.other.[*].usernameUsername of Gerrit user with ‘View Metrics’ capabilities
gerritServers.other.[*].passwordPassword of Gerrit user with ‘View Metrics’ capabilities
gerritServers.other.[*].healthcheckWhether to deploy a container that regularly pings the healthcheck plugin endpoint in Gerrit
gerritServers.other.[*].promtail.storagePathPath to directory, where Promtail is allowed to save files (e.g. positions.yaml)
gerritServers.other.[*].promtail.logPathPath to directory containing the Gerrit logs (e.g. /var/gerrit/logs)

Encryption

The configuration file contains secrets. Thus, to be able to share the configuration, e.g. with the CI-system, it is meant to be encrypted. The encryption is explained here.

The gerrit_monitoring.py install-command will decrypt the file before templating, if it was encrypted with sops.

Using Istio

The easiest way of using the monitoring setup, is to use an Ingress Controller, but it is also possible to use the setup within an Istio service mesh. To do this, Istio has to be already installed in the cluster and the istio-ingressgateway has to open the ports 80 and 443. Authentication and authorization for Prometheus and Loki for users outside of the cluster will be done by JWT-tokens. Promtail configurations created by the installer will automatically get a token configured during the installation. Should another token be needed, the follwoing command can be used to create a token:

pipenv run python ./gerrit-monitoring.py \
  --config config.yaml \
  jwt

Installation

Before using the script, set up a python environment using pipenv install.

The installation will use the environment of the current shell. Thus, make sure that the path for ytt, kubectland helm are set. Also the KUBECONFIG-variable has to be set to point to the kubeconfig of the target Kubernetes cluster.

This project provides a script to quickly install the monitoring setup. To use it, run:

pipenv run ./gerrit_monitoring.py \
  --config config.yaml \
  install \
  [--output ./dist] \
  [--dryrun] \
  [--update-repo]

The command will use the given configuration (--config/-c) to create the final files in the directory given by --output/-o (default ./dist) and install/update the Kubernetes resources and charts, if the --dryrun/-d flag is not set. If the --update-repo-flag is used, the helm repository will be updated before installing the helm charts. This is for example required, if a chart version was updated.

Configure Promtail

Promtail has to be installed with access to the directory containing the Gerrit logs, e.g. on the same host. The installation as described above will create a configuration file for Promtail, which can be found in ./dist/promtail.yaml. Use it to configure Promtail by using the -config.file=./dist/promtail.yaml- parameter, when starting Promtail. Using the Promtail binary directly this would result in the following command:

$PATH_TO_PROMTAIL/promtail \
  -config.file=./dist/promtail.yaml

If TLS-verification is activated, the CA-certificate used for verification (usually the one configured for tls.caCert) has to be present in the directory configured for promtail.storagePath in the config.yaml and has to be called promtail.ca.crt.

The Promtail configuration provided here expects the logs to be available in JSON-format. This can be configured by setting log.jsonLogging = true in the gerrit.config.

Uninstallation

To remove the Prometheus chart from the cluster, run

helm uninstall prometheus --namespace $NAMESPACE
helm uninstall loki --namespace $NAMESPACE
helm uninstall grafana --namespace $NAMESPACE
kubectl delete -f ./dist/configuration

To also release the volumes, run

kubectl delete -f ./dist/storage

NOTE: Doing so, all data, which was not backed up will be lost!

Remove the namespace:

kubectl delete -f ./dist/namespace.yaml

The ./gerrit_monitoring.py uninstall-script will automatically remove the charts installed in the configured namespace and delete the namespace as well:

pipenv run ./gerrit_monitoring.py \
  --config config.yaml \
  uninstall