Stop expensive conversions from scala to java lists

Java implementation of Gson does not know how to serialize scala
collections out of the box.  For this reason CommitsStatistics
and CommitInfo objects had been modeled with java.util.List and
java.util.Set instead, just to help gson to serialize them.

This change avoids unneded conversion from java to scala and vice-versa
- Remodeling of CommitsStatistics and CommitInfo to use scala native
- Registering a scala iterable serializer to Gson so that it knows how
to serialize scala lists and sets.

This bumps up the latency of the contributors endpoint of around 50%.

For example, when extracting the entire history of master branch for
Gerrit repo, latency decreases from 58s to 27s.

Change-Id: I1ba7e81eb81ebcffaa431a4340fb9da09e6a5976
6 files changed
tree: 873c8c342ce7e5fddafec3aa79e41f21e4280ce5
  1. .gitignore
  4. build.sbt
  5. project/
  6. src/

Analytics extraction plugin

Extract commit and review data from Gerrit projects and expose aggregated metrics over REST and SSH API.

How to build

To build the analytics plugin you need to have SBT 0.13.x or later installed. If you have a Linux operating system, see the Installing SBT on Linux instructions

Clone the analytics plugin and execute sbt assembly.


   $ git clone
   $ cd analytics && sbt assembly

The plugin jar file is created under target/scala-2.11/analytics.jar

How to install

Copy the analytics.jar generated onto the Gerrit's /plugins directory.

How to configure

Nothing to configure, it just works.

How to use

Adds new REST API and SSH commands to allow the extraction of repository statistics from Gerrit repositories and changes.


All the API share the same syntax and behaviour. Differently from the standard Gerrit REST API, the JSON collections are returned as individual lines and streamed over the socket I/O. The choice is driven by the fact that the typical consumer of these API is a BigData batch process, typically external to Gerrit and hosted on a separate computing cluster.

A large volume of data can be potentially generated: splitting the output file into separate lines helps the BigData processing in the splitting, shuffling and sorting phase.


Extract a unordered list of project contributors statistics, including the commits data relevant for statistics purposes, such as number of involved files, and optionally also the list of belonging branches, number of added/deleted lines, timestamp and merge flag.

Optionally, extract information on issues using the commentLink Gerrit configuration and enrich the statistics with the issue-ids and links obtained from the commit message.




analytics contributors {project-name} [--since 2006-01-02[15:04:05[.890][-0700]]] [--until 2018-01-02[18:01:03[.333][-0700]]]


  • --since -b Starting timestamp to consider
  • --until -e Ending timestamp (excluded) to consider
  • --aggregate -granularity -g one of email, email_year, email_month, email_day, email_hour defaulting to aggregation by email
  • --extract-branches -r enables splitting of aggregation by branch name and expose branch name in the payload
  • --extract-issues -i enables the extraction of issues from commentLink
  • --botlike-filename-regexps -n comma separated list of regexps that identify a bot-like commit, commits that modify only files whose name is a match will be flagged as bot-like

NOTE: Timestamp format is consistent with Gerrit's query syntax, see /Documentation/user-search.html for details.


  • REST:
   $ curl
   {"name":"John Doe","email":"","num_commits":1, "num_files":4,"added_lines":9,"deleted_lines":1, "commits":[{"sha1":"6a1f73738071e299f600017d99f7252d41b96b4b","date":"Apr 28, 2011 5:13:14 AM","merge":false,"bot_like": false}],"is_bot_like": false}
   {"name":"Matt Smith","email":"","num_commits":1, "num_files":1,"added_lines":90,"deleted_lines":10,"commits":[{"sha1":"54527e7e3086758a23e3b069f183db6415aca304","date":"Sep 8, 2015 3:11:23 AM","merge":true,"bot_like": false}],"branches":["master"],"is_bot_like": false}
  • SSH:
   $ ssh -p 29418 analytics contributors myproject --since 2017-08-01 --until 2017-12-31 --extract-issues
   {"name":"John Doe","email":"","num_commits":1, "num_files":4,"added_lines":9,"deleted_lines":1, "commits":[{"sha1":"6a1f73738071e299f600017d99f7252d41b96b4b","date":"Apr 28, 2011 5:13:14 AM","merge":false,"bot_like": false}],"is_bot_like": false,"issues_codes":["PRJ-001"],"issues_links":[""]}
   {"name":"Matt Smith","email":"","num_commits":1, "num_files":1,"added_lines":90,"deleted_lines":10,"commits":[{"sha1":"54527e7e3086758a23e3b069f183db6415aca304","date":"Sep 8, 2015 3:11:23 AM","merge":true,"bot_like": false,}],"is_bot_like": false,"branches":["branch1"],"issues_codes":["PRJ-002","PRJ-003"],"issues_links":["",""]}
  • BOT-like: Flags the commit as bot-like when all files in that commit match any of the following regular expressions:

    • .+\.xml
    • .+\.bzl
    • BUILD
    • \.gitignore
    • plugins/
    • \.settings
curl ''

  "year": 2018,
  "month": 3,
  "day": 21,
  "hour": 19,
  "name": "Dave Borowitz",
  "email": "",
  "num_commits": 1,
  "num_files": 6,
  "num_distinct_files": 6,
  "added_lines": 6,
  "deleted_lines": 6,
  "commits": [
      "sha1": "a3ab2e1d07e6745f50b1d9907f6580c6521fd035",
      "date": 1521661246000,
      "merge": false,
      "bot_like": true,
      "files": [
  "branches": [],
  "issues_codes": [],
  "issues_links": [],
  "last_commit_date": 1521661246000,
  "is_merge": false,
  "is_bot_like": true