tree 55f5b0b5e52714930f71957b7d09f21dcd68675d
parent 2024bc96182566bceff1328abc3f9d81052cfcab
author Antonio Barone <syntonyze@gmail.com> 1555594613 +0300
committer Antonio Barone <syntonyze@gmail.com> 1559051547 +0100

Improve the performance of branch extraction

Improve the performance of contributors endpoint by using
an in-memory/on-disk cache to avoid computing file diff for the same
objectIds over and over again but for different branches.

Remove also the use of parallel collection because of
greatly diminishing the throughput caused by increasing the overhead
in memory management.

The performance improvement is higher for larget repositories with
a lot of branches.

Examples:

- Extracting analytics, including branches, since 2000-01-01 from the
Gerrit Code Review repo, which at the time of writing has a size of 130Mb
and has only 15 branches might save up to 10 seconds. Arguably not that
much.

- Extracting analytics, including branches, since 2017-01-01 from the
platform/prebuilts/tools repo (part of AOSP), which at the time of
writing has a size of 22Gb and 215 branchea, might help latency to go
down from 7minutes to 20 seconds (20 times faster).

Extracting from this very same repo since 2000-01-01, allows to get a
response in a matter of minutes rather several hours.

Bug: Issue 10729
Change-Id: I991a5fc82d7c32c6e035da8b90ba4bebeab50188
