Documentation/user-request-tracing.txt - gerrit - Git at Google

 = Request Tracing

 [[on-demand]]
 == On-demand Request Tracing

 Gerrit supports on-demand tracing of single requests that results in
 additional logs with debug information that are written to the
 `error_log`. The logs that correspond to a traced request are
 associated with a unique trace ID. This trace ID is returned with the
 response and can be used by an administrator to find the matching log
 entries.

 How tracing is enabled and how the trace ID is returned depends on the
 request type:

 * REST API: For REST calls tracing can be enabled by setting the
   `trace` request parameter or the `X-Gerrit-Trace` header, the trace
   ID is returned as `X-Gerrit-Trace` header. More information about
   this can be found in the link:rest-api.html#tracing[Request Tracing]
   section of the link:rest-api.html[REST API documentation].
 * SSH API: For SSH calls tracing can be enabled by setting the
   `--trace` option. More information about this can be found in
   the link:cmd-index.html#trace[Trace] section of the
   link:cmd-index.html[SSH command documentation].
 * Git Push (requires usage of git protocol v2): For Git pushes tracing
   can be enabled by setting the `trace` push option, the trace ID is
   returned in the command output. More information about this can be
   found in the link:user-upload.html#trace[Trace] section of the
   link:user-upload.html[upload documentation].
 * Git Clone/Fetch/Ls-Remote (requires usage of git protocol v2): For
   Git clone/fetch/ls-remote tracing can be enabled by setting the
   `trace` server option. Use '-o trace=<TRACE-ID>' for `git fetch` and
   `git ls-remote`, and '--server-option trace=<TRACE-ID>' for
   `git clone`. If the `trace` server option is set without a value
   (without trace ID) a trace ID is generated but the generated trace ID
   is not returned to the client (hence a trace ID should always be
   set).

 When request tracing is enabled it is possible to provide an ID that
 should be used as trace ID. If a trace ID is not provided a trace ID is
 automatically generated. The trace ID must be provided to the support
 team (administrator of the server) so that they can find the trace.

 When doing traces it is recommended to specify the ID of the issue
 that is being investigated as trace ID so that the traces of the issue
 can be found more easily. When the issue ID is used as trace ID there
 is no need to find the generated trace ID and report it in the issue.

 Since tracing consumes additional server resources tracing should only
 be enabled for single requests if there is a concrete need for
 debugging. In particular bots should never enable tracing for all their
 requests by default.

 [[auto-retry]]
 == Automatic Request Tracing

 Gerrit can be link:config-gerrit.html#retry.retryWithTraceOnFailure[
 configured] to automatically retry requests on non-recoverable failures
 with tracing enabled. This allows to automatically captures traces of
 these failures for further analysis by the Gerrit administrators.

 The auto-retry on failure behaves the same way as if the calling user
 would retry the failed operation with tracing enabled.

 It is expected that the auto-retry fails with the same exception that
 triggered the auto-retry, however this is not guaranteed:

 * Not all Gerrit operations are fully atomic and it can happen that
   some parts of the operation have been successfully performed before
   the failure happened. In this case the auto-retry may fail with a
   different exception.
 * Some exceptions may mistakenly be considered as non-recoverable and
   the auto-retry actually succeeds.

 [[auto-retry-succeeded]]
 If an auto-retry succeeds you may consider filing this as
 link:https://bugs.chromium.org/p/gerrit/issues/entry?template=GoogleSource+Issue[
 Gerrit issue] so that the Gerrit developers can fix this and treat this
 exception as recoverable.

 The trace IDs for auto-retries are generated and start with
 `retry-on-failure-`. For REST requests they are returned to the client
 as `X-Gerrit-Trace` header.

 The best way to search for auto-retries in logs is to do a grep by
 `AutoRetry`. For each auto-retry that happened this should match 1 or 2
 log entries:

 * one `ERROR` log entry with the exception that triggered the
   auto-retry
 * one `FINE` log entry with the exception that happened on auto-retry
   (if this log entry is not present the operation succeeded on
   auto-retry)

 To inspect single auto-retry occurrences in detail you can do a
 link:#find-trace[grep by the trace ID]. The trace ID is part of the log
 entries which have been found by the previous grep (watch out for
 something like: `retry-on-failure-1534166888910-3985dfba`).

 [TIP]
 Auto-retrying on failures is only supported by some of the REST
 endpoints (change REST endpoints that perform updates).

 [[auto-retry-metrics]]
 === Metrics

 If auto-retry is link:config-gerrit.html#retry.retryWithTraceOnFailure[
 enabled] the following metrics are reported:

 * `action/auto_retry_count`: Number of automatic retries with tracing
 * `action/failures_on_auto_retry_count`: Number of failures on auto retry

 By comparing the values of these counters one can see how often the
 auto-retry succeeds. As explained link:#auto-retry-succeeded[above] if
 auto-retries succeed that's an issue with Gerrit that you may want to
 report.

 [[find-trace]]
 == Find log entries for a trace ID

 If tracing is enabled all log messages that correspond to the traced
 request have a `TRACE_ID` tag set, e.g.:

 ----
 [2018-08-13 15:28:08,913] [HTTP-76] TRACE com.google.gerrit.httpd.restapi.RestApiServlet : Received REST request: GET /a/accounts/self (parameters: [trace]) [CONTEXT forced=true TRACE_ID="1534166888910-3985dfba" ]
 [2018-08-13 15:28:08,914] [HTTP-76] TRACE com.google.gerrit.httpd.restapi.RestApiServlet : Calling user: admin [CONTEXT forced=true TRACE_ID="1534166888910-3985dfba" ]
 [2018-08-13 15:28:08,942] [HTTP-76] TRACE com.google.gerrit.httpd.restapi.RestApiServlet : REST call succeeded: 200 [CONTEXT forced=true TRACE_ID="1534166888910-3985dfba" ]
 ----

 By doing a grep with the trace ID over the error log the log entries
 that correspond to the request can be found.

 [TIP]
 Usually only server administrators have access to the logs.

 == Which information is captured in a trace?

 * request details
 ** REST API: request URL, request parameter names, calling user,
    response code, response body on errors
 ** SSH API: parameter names
 ** Git API: push options, magic branch parameter names
 * cache misses, cache evictions
 * reads from NoteDb, writes to NoteDb
 * reads of meta data files, writes of meta data files
 * index queries (with parameters and matches)
 * reindex events
 * permission checks (e.g. which rule is responsible for a deny)
 * timer metrics
 * all other logs
	= Request Tracing

	[[on-demand]]
	== On-demand Request Tracing

	Gerrit supports on-demand tracing of single requests that results in
	additional logs with debug information that are written to the
	`error_log`. The logs that correspond to a traced request are
	associated with a unique trace ID. This trace ID is returned with the
	response and can be used by an administrator to find the matching log
	entries.

	How tracing is enabled and how the trace ID is returned depends on the
	request type:

	* REST API: For REST calls tracing can be enabled by setting the
	`trace` request parameter or the `X-Gerrit-Trace` header, the trace
	ID is returned as `X-Gerrit-Trace` header. More information about
	this can be found in the link:rest-api.html#tracing[Request Tracing]
	section of the link:rest-api.html[REST API documentation].
	* SSH API: For SSH calls tracing can be enabled by setting the
	`--trace` option. More information about this can be found in
	the link:cmd-index.html#trace[Trace] section of the
	link:cmd-index.html[SSH command documentation].
	* Git Push (requires usage of git protocol v2): For Git pushes tracing
	can be enabled by setting the `trace` push option, the trace ID is
	returned in the command output. More information about this can be
	found in the link:user-upload.html#trace[Trace] section of the
	link:user-upload.html[upload documentation].
	* Git Clone/Fetch/Ls-Remote (requires usage of git protocol v2): For
	Git clone/fetch/ls-remote tracing can be enabled by setting the
	`trace` server option. Use '-o trace=<TRACE-ID>' for `git fetch` and
	`git ls-remote`, and '--server-option trace=<TRACE-ID>' for
	`git clone`. If the `trace` server option is set without a value
	(without trace ID) a trace ID is generated but the generated trace ID
	is not returned to the client (hence a trace ID should always be
	set).

	When request tracing is enabled it is possible to provide an ID that
	should be used as trace ID. If a trace ID is not provided a trace ID is
	automatically generated. The trace ID must be provided to the support
	team (administrator of the server) so that they can find the trace.

	When doing traces it is recommended to specify the ID of the issue
	that is being investigated as trace ID so that the traces of the issue
	can be found more easily. When the issue ID is used as trace ID there
	is no need to find the generated trace ID and report it in the issue.

	Since tracing consumes additional server resources tracing should only
	be enabled for single requests if there is a concrete need for
	debugging. In particular bots should never enable tracing for all their
	requests by default.

	[[auto-retry]]
	== Automatic Request Tracing

	Gerrit can be link:config-gerrit.html#retry.retryWithTraceOnFailure[
	configured] to automatically retry requests on non-recoverable failures
	with tracing enabled. This allows to automatically captures traces of
	these failures for further analysis by the Gerrit administrators.

	The auto-retry on failure behaves the same way as if the calling user
	would retry the failed operation with tracing enabled.

	It is expected that the auto-retry fails with the same exception that
	triggered the auto-retry, however this is not guaranteed:

	* Not all Gerrit operations are fully atomic and it can happen that
	some parts of the operation have been successfully performed before
	the failure happened. In this case the auto-retry may fail with a
	different exception.
	* Some exceptions may mistakenly be considered as non-recoverable and
	the auto-retry actually succeeds.

	[[auto-retry-succeeded]]
	If an auto-retry succeeds you may consider filing this as
	link:https://bugs.chromium.org/p/gerrit/issues/entry?template=GoogleSource+Issue[
	Gerrit issue] so that the Gerrit developers can fix this and treat this
	exception as recoverable.

	The trace IDs for auto-retries are generated and start with
	`retry-on-failure-`. For REST requests they are returned to the client
	as `X-Gerrit-Trace` header.

	The best way to search for auto-retries in logs is to do a grep by
	`AutoRetry`. For each auto-retry that happened this should match 1 or 2
	log entries:

	* one `ERROR` log entry with the exception that triggered the
	auto-retry
	* one `FINE` log entry with the exception that happened on auto-retry
	(if this log entry is not present the operation succeeded on
	auto-retry)

	To inspect single auto-retry occurrences in detail you can do a
	link:#find-trace[grep by the trace ID]. The trace ID is part of the log
	entries which have been found by the previous grep (watch out for
	something like: `retry-on-failure-1534166888910-3985dfba`).

	[TIP]
	Auto-retrying on failures is only supported by some of the REST
	endpoints (change REST endpoints that perform updates).

	[[auto-retry-metrics]]
	=== Metrics

	If auto-retry is link:config-gerrit.html#retry.retryWithTraceOnFailure[
	enabled] the following metrics are reported:

	* `action/auto_retry_count`: Number of automatic retries with tracing
	* `action/failures_on_auto_retry_count`: Number of failures on auto retry

	By comparing the values of these counters one can see how often the
	auto-retry succeeds. As explained link:#auto-retry-succeeded[above] if
	auto-retries succeed that's an issue with Gerrit that you may want to
	report.

	[[find-trace]]
	== Find log entries for a trace ID

	If tracing is enabled all log messages that correspond to the traced
	request have a `TRACE_ID` tag set, e.g.:

	----
	[2018-08-13 15:28:08,913] [HTTP-76] TRACE com.google.gerrit.httpd.restapi.RestApiServlet : Received REST request: GET /a/accounts/self (parameters: [trace]) [CONTEXT forced=true TRACE_ID="1534166888910-3985dfba" ]
	[2018-08-13 15:28:08,914] [HTTP-76] TRACE com.google.gerrit.httpd.restapi.RestApiServlet : Calling user: admin [CONTEXT forced=true TRACE_ID="1534166888910-3985dfba" ]
	[2018-08-13 15:28:08,942] [HTTP-76] TRACE com.google.gerrit.httpd.restapi.RestApiServlet : REST call succeeded: 200 [CONTEXT forced=true TRACE_ID="1534166888910-3985dfba" ]
	----

	By doing a grep with the trace ID over the error log the log entries
	that correspond to the request can be found.

	[TIP]
	Usually only server administrators have access to the logs.

	== Which information is captured in a trace?

	* request details
	** REST API: request URL, request parameter names, calling user,
	response code, response body on errors
	** SSH API: parameter names
	** Git API: push options, magic branch parameter names
	* cache misses, cache evictions
	* reads from NoteDb, writes to NoteDb
	* reads of meta data files, writes of meta data files
	* index queries (with parameters and matches)
	* reindex events
	* permission checks (e.g. which rule is responsible for a deny)
	* timer metrics
	* all other logs