Add support for configurable performance metrics

To optimize performance in Gerrit, it's important to know which
operations are expensive, how much latency they cause and how often they
are called. We did have performance metrics for this once, but had to
drop them because it was too expensive to capture this data for all
operations (see https://issues.gerritcodereview.com/issues/40014513).

We still have performance logs, which allows us to find expensive
operations, but knowing their latency distribution (e.g. p90, p99) and
how often they are called across all requests cannot easily be computed
from them.

With this change we re-add the performance metrics, but require that the
operations for which we record them are explicitly configured. This way
a metric explosion can be avoided while we can still get this data for
operations that we know are slow and which we want to optimize.

Having the performance metrics is useful to measure the impact of
performance improvements:

1. Find a slow operation in the performance logs
2. Configure the slow operation to be recorded in the metrics
3. Collect metric data for some days to know the average latency
4. Optimise the performance of the operation
5. Check how the average latency has improved
6. Estimate the total performance gain by multiplying the latency
   improvement with the number of times the operation is called

Alternatively we could add a new specific metric each time we want to
improve something, but this seems more overhead. An example for such a
metric is the existing parent_data_computation metric, which we can drop
once we have confirmed that the configurable performance metrics are
working.

This basically reverts commit d8af41afd1f352f8e25c9f79bff705c05665934c
and 491337d70cad8f3e25ae3477a7723021d395a480 but avoids the metric
explosion by requiring to configure the recorded operations explicitly.

Release-Notes: Added support for configurable performance metrics
Change-Id: I65790d0b17e2ccb1ced9e8ec2a1766c5286dcf91
diff --git a/Documentation/config-gerrit.txt b/Documentation/config-gerrit.txt
index 6b3b79f..0c218c8 100644
--- a/Documentation/config-gerrit.txt
+++ b/Documentation/config-gerrit.txt
@@ -5976,6 +5976,19 @@
 +
 By default, unset (all projects are matched).
 
+[[tracing.metric]]
+==== Subsection tracing.metric
+
+Section to control for which operations latency and counts should be recorded
+in the link:metrics.html#performance[performance metrics].
+
+[[tracing.metric.operation]]tracing.metric.operation::
++
+Name of a Gerrit operation for which latency and counts should be recorded in
+the link:metrics.html#performance[performance metrics].
++
+The operation name must match the operation name that is used with TraceTimer.
+
 [[deadline.id]]
 ==== Subsection deadline.<id>
 
diff --git a/Documentation/metrics.txt b/Documentation/metrics.txt
index 3b56391..1ca750b 100644
--- a/Documentation/metrics.txt
+++ b/Documentation/metrics.txt
@@ -73,6 +73,30 @@
 ** `cancellation_type`:
    The cancellation type (graceful or forceful).
 
+[[performance]]
+=== Performance
+
+* `performance/operations`: Latency of performing operations
+** `operation_name`:
+   The operation that was performed.
+** `request`:
+   The request for which the operation was performed (format = '<request-type>
+   <redacted-request-uri>').
+** `plugin`:
+   The name of the plugin that performed the operation.
+* `performance/operations_count`: Number of performed operations
+** `operation_name`:
+   The operation that was performed.
+** `request`:
+   The request for which the operation was performed (format = '<request-type>
+   <redacted-request-uri>').
+** `plugin`:
+   The name of the plugin that performed the operation.
+
+Performance metrics can be enabled via the
+link:config.gerrit.html#tracing.exportPerformanceMetrics[`tracing.exportPerformanceMetrics`]
+setting.
+
 === Pushes
 
 * `receivecommits/changes`: histogram of number of changes processed in a single
diff --git a/java/com/google/gerrit/metrics/Timer1.java b/java/com/google/gerrit/metrics/Timer1.java
index 9dc32da..cdf457c 100644
--- a/java/com/google/gerrit/metrics/Timer1.java
+++ b/java/com/google/gerrit/metrics/Timer1.java
@@ -54,6 +54,8 @@
     }
   }
 
+  private boolean suppressLogging;
+
   protected final String name;
   protected final Field<F1> field;
 
@@ -87,15 +89,24 @@
     field.metadataMapper().accept(metadataBuilder, fieldValue);
     Metadata metadata = metadataBuilder.build();
 
-    LoggingContext.getInstance()
-        .addPerformanceLogRecord(() -> PerformanceLogRecord.create(name, durationNanos, metadata));
-    logger.atFinest().log(
-        "%s (%s = %s) took %.2f ms", name, field.name(), fieldValue, durationNanos / 1000000.0);
+    if (!suppressLogging) {
+      LoggingContext.getInstance()
+          .addPerformanceLogRecord(
+              () -> PerformanceLogRecord.create(name, durationNanos, metadata));
+      logger.atFinest().log(
+          "%s (%s = %s) took %.2f ms", name, field.name(), fieldValue, durationNanos / 1000000.0);
+    }
 
     doRecord(fieldValue, value, unit);
     RequestStateContext.abortIfCancelled();
   }
 
+  /** Suppress logging (debug log and performance log) when values are recorded. */
+  public final Timer1<F1> suppressLogging() {
+    this.suppressLogging = true;
+    return this;
+  }
+
   /**
    * Record a value in the distribution.
    *
diff --git a/java/com/google/gerrit/metrics/Timer2.java b/java/com/google/gerrit/metrics/Timer2.java
index 46dd617..9f08711 100644
--- a/java/com/google/gerrit/metrics/Timer2.java
+++ b/java/com/google/gerrit/metrics/Timer2.java
@@ -57,6 +57,8 @@
     }
   }
 
+  private boolean suppressLogging;
+
   protected final String name;
   protected final Field<F1> field1;
   protected final Field<F2> field2;
@@ -95,16 +97,25 @@
     field2.metadataMapper().accept(metadataBuilder, fieldValue2);
     Metadata metadata = metadataBuilder.build();
 
-    LoggingContext.getInstance()
-        .addPerformanceLogRecord(() -> PerformanceLogRecord.create(name, durationNanos, metadata));
-    logger.atFinest().log(
-        "%s (%s = %s, %s = %s) took %.2f ms",
-        name, field1.name(), fieldValue1, field2.name(), fieldValue2, durationNanos / 1000000.0);
+    if (!suppressLogging) {
+      LoggingContext.getInstance()
+          .addPerformanceLogRecord(
+              () -> PerformanceLogRecord.create(name, durationNanos, metadata));
+      logger.atFinest().log(
+          "%s (%s = %s, %s = %s) took %.2f ms",
+          name, field1.name(), fieldValue1, field2.name(), fieldValue2, durationNanos / 1000000.0);
+    }
 
     doRecord(fieldValue1, fieldValue2, value, unit);
     RequestStateContext.abortIfCancelled();
   }
 
+  /** Suppress logging (debug log and performance log) when values are recorded. */
+  public final Timer2<F1, F2> suppressLogging() {
+    this.suppressLogging = true;
+    return this;
+  }
+
   /**
    * Record a value in the distribution.
    *
diff --git a/java/com/google/gerrit/metrics/Timer3.java b/java/com/google/gerrit/metrics/Timer3.java
index 922cdd6..2a01d5d 100644
--- a/java/com/google/gerrit/metrics/Timer3.java
+++ b/java/com/google/gerrit/metrics/Timer3.java
@@ -60,6 +60,8 @@
     }
   }
 
+  private boolean suppressLogging;
+
   protected final String name;
   protected final Field<F1> field1;
   protected final Field<F2> field2;
@@ -104,23 +106,32 @@
     field3.metadataMapper().accept(metadataBuilder, fieldValue3);
     Metadata metadata = metadataBuilder.build();
 
-    LoggingContext.getInstance()
-        .addPerformanceLogRecord(() -> PerformanceLogRecord.create(name, durationNanos, metadata));
-    logger.atFinest().log(
-        "%s (%s = %s, %s = %s, %s = %s) took %.2f ms",
-        name,
-        field1.name(),
-        fieldValue1,
-        field2.name(),
-        fieldValue2,
-        field3.name(),
-        fieldValue3,
-        durationNanos / 1000000.0);
+    if (!suppressLogging) {
+      LoggingContext.getInstance()
+          .addPerformanceLogRecord(
+              () -> PerformanceLogRecord.create(name, durationNanos, metadata));
+      logger.atFinest().log(
+          "%s (%s = %s, %s = %s, %s = %s) took %.2f ms",
+          name,
+          field1.name(),
+          fieldValue1,
+          field2.name(),
+          fieldValue2,
+          field3.name(),
+          fieldValue3,
+          durationNanos / 1000000.0);
+    }
 
     doRecord(fieldValue1, fieldValue2, fieldValue3, value, unit);
     RequestStateContext.abortIfCancelled();
   }
 
+  /** Suppress logging (debug log and performance log) when values are recorded. */
+  public final Timer3<F1, F2, F3> suppressLogging() {
+    this.suppressLogging = true;
+    return this;
+  }
+
   /**
    * Record a value in the distribution.
    *
diff --git a/java/com/google/gerrit/server/PerformanceMetrics.java b/java/com/google/gerrit/server/PerformanceMetrics.java
new file mode 100644
index 0000000..3d21fed
--- /dev/null
+++ b/java/com/google/gerrit/server/PerformanceMetrics.java
@@ -0,0 +1,104 @@
+// Copyright (C) 2021 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.gerrit.server;
+
+import com.google.common.collect.ImmutableList;
+import com.google.gerrit.common.Nullable;
+import com.google.gerrit.metrics.Counter3;
+import com.google.gerrit.metrics.Description;
+import com.google.gerrit.metrics.Field;
+import com.google.gerrit.metrics.MetricMaker;
+import com.google.gerrit.metrics.Timer3;
+import com.google.gerrit.server.config.GerritServerConfig;
+import com.google.gerrit.server.logging.Metadata;
+import com.google.gerrit.server.logging.PerformanceLogger;
+import com.google.gerrit.server.logging.TraceContext;
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import java.time.Instant;
+import java.util.concurrent.TimeUnit;
+import org.eclipse.jgit.lib.Config;
+
+/** Performance logger that records the execution times as a metric. */
+@Singleton
+public class PerformanceMetrics implements PerformanceLogger {
+  private static final String OPERATION_LATENCY_METRIC_NAME = "performance/operations";
+  private static final String OPERATION_COUNT_METRIC_NAME = "performance/operations_count";
+
+  private final ImmutableList<String> tracedOperations;
+  public final Timer3<String, String, String> operationsLatency;
+  public final Counter3<String, String, String> operationsCounter;
+
+  @Inject
+  PerformanceMetrics(@GerritServerConfig Config cfg, MetricMaker metricMaker) {
+    this.tracedOperations =
+        ImmutableList.copyOf(cfg.getStringList("tracing", "metric", "operation"));
+
+    Field<String> operationNameField =
+        Field.ofString(
+                "operation_name",
+                (metadataBuilder, fieldValue) -> metadataBuilder.operationName(fieldValue))
+            .description("The operation that was performed.")
+            .build();
+    Field<String> requestField =
+        Field.ofString("request", (metadataBuilder, fieldValue) -> {})
+            .description(
+                "The request for which the operation was performed"
+                    + " (format = '<request-type> <redacted-request-uri>').")
+            .build();
+    Field<String> pluginField =
+        Field.ofString(
+                "plugin", (metadataBuilder, fieldValue) -> metadataBuilder.pluginName(fieldValue))
+            .description("The name of the plugin that performed the operation.")
+            .build();
+
+    this.operationsLatency =
+        metricMaker
+            .newTimer(
+                OPERATION_LATENCY_METRIC_NAME,
+                new Description("Latency of performing operations")
+                    .setCumulative()
+                    .setUnit(Description.Units.MILLISECONDS),
+                operationNameField,
+                requestField,
+                pluginField)
+            .suppressLogging();
+    this.operationsCounter =
+        metricMaker.newCounter(
+            OPERATION_COUNT_METRIC_NAME,
+            new Description("Number of performed operations").setRate(),
+            operationNameField,
+            requestField,
+            pluginField);
+  }
+
+  @Override
+  public void logNanos(String operation, long durationNanos, Instant endTime) {
+    logNanos(operation, durationNanos, endTime, /* metadata= */ null);
+  }
+
+  @Override
+  public void logNanos(
+      String operation, long durationNanos, Instant endTime, @Nullable Metadata metadata) {
+    if (!tracedOperations.contains(operation)) {
+      return;
+    }
+
+    String requestTag = TraceContext.getTag(TraceRequestListener.TAG_REQUEST).orElse("");
+    String pluginTag = TraceContext.getPluginTag().orElse("");
+    operationsLatency.record(operation, requestTag, pluginTag, durationNanos, TimeUnit.NANOSECONDS);
+    operationsCounter.increment(operation, requestTag, pluginTag);
+  }
+}
diff --git a/java/com/google/gerrit/server/config/GerritGlobalModule.java b/java/com/google/gerrit/server/config/GerritGlobalModule.java
index 2b26956..c4ecc80 100644
--- a/java/com/google/gerrit/server/config/GerritGlobalModule.java
+++ b/java/com/google/gerrit/server/config/GerritGlobalModule.java
@@ -89,6 +89,7 @@
 import com.google.gerrit.server.ExceptionHookImpl;
 import com.google.gerrit.server.ExternalUser;
 import com.google.gerrit.server.IdentifiedUser;
+import com.google.gerrit.server.PerformanceMetrics;
 import com.google.gerrit.server.RequestListener;
 import com.google.gerrit.server.TraceRequestListener;
 import com.google.gerrit.server.account.AccountControl;
@@ -455,6 +456,7 @@
     DynamicSet.setOf(binder(), SubmitRequirement.class);
     DynamicSet.setOf(binder(), QuotaEnforcer.class);
     DynamicSet.setOf(binder(), PerformanceLogger.class);
+    DynamicSet.bind(binder(), PerformanceLogger.class).to(PerformanceMetrics.class);
     DynamicSet.setOf(binder(), RequestListener.class);
     DynamicSet.bind(binder(), RequestListener.class).to(TraceRequestListener.class);
     DynamicSet.setOf(binder(), ExceptionHook.class);