commit | 1522caecce4b02649ee00f802e23cc5b2401a6e9 | [log] [tgz] |
---|---|---|
author | Antonio Barone <syntonyze@gmail.com> | Thu Apr 18 16:36:53 2019 +0300 |
committer | Antonio Barone <syntonyze@gmail.com> | Tue May 28 14:52:27 2019 +0100 |
tree | 55f5b0b5e52714930f71957b7d09f21dcd68675d | |
parent | 2024bc96182566bceff1328abc3f9d81052cfcab [diff] |
Improve the performance of branch extraction Improve the performance of contributors endpoint by using an in-memory/on-disk cache to avoid computing file diff for the same objectIds over and over again but for different branches. Remove also the use of parallel collection because of greatly diminishing the throughput caused by increasing the overhead in memory management. The performance improvement is higher for larget repositories with a lot of branches. Examples: - Extracting analytics, including branches, since 2000-01-01 from the Gerrit Code Review repo, which at the time of writing has a size of 130Mb and has only 15 branches might save up to 10 seconds. Arguably not that much. - Extracting analytics, including branches, since 2017-01-01 from the platform/prebuilts/tools repo (part of AOSP), which at the time of writing has a size of 22Gb and 215 branchea, might help latency to go down from 7minutes to 20 seconds (20 times faster). Extracting from this very same repo since 2000-01-01, allows to get a response in a matter of minutes rather several hours. Bug: Issue 10729 Change-Id: I991a5fc82d7c32c6e035da8b90ba4bebeab50188
Extract commit and review data from Gerrit projects and expose aggregated metrics over REST and SSH API.
To build the analytics plugin you need to have SBT 0.13.x or later installed. If you have a Linux operating system, see the Installing SBT on Linux instructions
Clone the analytics plugin and execute sbt assembly
.
Example:
$ git clone https://gerrit.googlesource.com/plugins/analytics $ cd analytics && sbt assembly
The plugin jar file is created under target/scala-2.11/analytics.jar
Copy the analytics.jar generated onto the Gerrit's /plugins directory.
See the relevant section in the configuration guide
Adds new REST API and SSH commands to allow the extraction of repository statistics from Gerrit repositories and changes.
All the API share the same syntax and behaviour. Differently from the standard Gerrit REST API, the JSON collections are returned as individual lines and streamed over the socket I/O. The choice is driven by the fact that the typical consumer of these API is a BigData batch process, typically external to Gerrit and hosted on a separate computing cluster.
A large volume of data can be potentially generated: splitting the output file into separate lines helps the BigData processing in the splitting, shuffling and sorting phase.
Extract a unordered list of project contributors statistics, including the commits data relevant for statistics purposes, such as number of involved files, and optionally also the list of belonging branches, number of added/deleted lines, timestamp and merge flag.
Optionally, extract information on issues using the commentLink Gerrit configuration and enrich the statistics with the issue-ids and links obtained from the commit message.
REST
/projects/{project-name}/analytics~contributors[?since=2006-01-02[15:04:05[.890][-0700]]][&until=2018-01-02[18:01:03[.333][-0700]]][&aggregate=email_year]
SSH
analytics contributors {project-name} [--since 2006-01-02[15:04:05[.890][-0700]]] [--until 2018-01-02[18:01:03[.333][-0700]]]
NOTE: Timestamp format is consistent with Gerrit's query syntax, see /Documentation/user-search.html for details.
$ curl http://gerrit.mycompany.com/projects/myproject/analytics~contributors {"name":"John Doe","email":"john.doe@mycompany.com","num_commits":1, "num_files":4,"added_lines":9,"deleted_lines":1, "commits":[{"sha1":"6a1f73738071e299f600017d99f7252d41b96b4b","date":"Apr 28, 2011 5:13:14 AM","merge":false,"bot_like": false}],"is_bot_like": false} {"name":"Matt Smith","email":"matt.smith@mycompany.com","num_commits":1, "num_files":1,"added_lines":90,"deleted_lines":10,"commits":[{"sha1":"54527e7e3086758a23e3b069f183db6415aca304","date":"Sep 8, 2015 3:11:23 AM","merge":true,"bot_like": false}],"branches":["master"],"is_bot_like": false}
$ ssh -p 29418 admin@gerrit.mycompany.com analytics contributors myproject --since 2017-08-01 --until 2017-12-31 --extract-issues {"name":"John Doe","email":"john.doe@mycompany.com","num_commits":1, "num_files":4,"added_lines":9,"deleted_lines":1, "commits":[{"sha1":"6a1f73738071e299f600017d99f7252d41b96b4b","date":"Apr 28, 2011 5:13:14 AM","merge":false,"bot_like": false}],"is_bot_like": false,"issues_codes":["PRJ-001"],"issues_links":["https://jira.company.org/PRJ-001"]} {"name":"Matt Smith","email":"matt.smith@mycompany.com","num_commits":1, "num_files":1,"added_lines":90,"deleted_lines":10,"commits":[{"sha1":"54527e7e3086758a23e3b069f183db6415aca304","date":"Sep 8, 2015 3:11:23 AM","merge":true,"bot_like": false,}],"is_bot_like": false,"branches":["branch1"],"issues_codes":["PRJ-002","PRJ-003"],"issues_links":["https://jira.company.org/PRJ-002","https://jira.company.org/PRJ-003"]}