Allow authors to have multiple emails

Authors can have multiple emails, but it is useful to group together
all the commits coming from the same author.

This change allow the mapping of different emails to the same autor.

The "name" field has been dropped in favour of "author".
author = "author" from alias file || "name" from user activity

Change-Id: I7ee6900f40b51ee9f6785676bc0fc169a7e56a29
6 files changed
tree: cd2d403a140a48a123e7a1dfd980e8d5eb77f5ec
  1. kibana/
  2. project/
  3. src/
  4. .gitignore
  5. build.sbt
  6. docker-compose.yaml
  7. LICENSE
  8. README.md
README.md

spark-gerrit-analytics-etl

Spark ETL to extra analytics data from Gerrit Projects.

Job can be launched with the following parameters:

bin/spark-submit \
    --conf spark.es.nodes=es.mycompany.com \
    --conf spark.es.net.http.auth.user=elastic \
    --conf spark.es.net.http.auth.pass=changeme \
    $JARS/SparkAnalytics-assembly-1.0.jar \
    --since 2000-06-01 \
    --aggregate email_hour \
    --url http://gerrit.mycompany.com \
    -e gerrit/analytics

Parameters

  • since, until, aggregate are the same defined in Gerrit Analytics plugin see: https://gerrit.googlesource.com/plugins/analytics/+/master/README.md
  • -u --url Gerrit server URL with the analytics plugins installed
  • -e --elasticIndex specify as / to be loaded in Elastic Search if not provided no ES export will be performed
  • -o --out folder location for storing the output as JSON files if not provided data is saved to /analytics- where is the system temporary directory
  • -a --email-aliases (optional) “emails to author alias” input data path. Here an example of the required files structure:
    {"author": "John", "emails": ["john@email.com", "john@anotheremail.com"]}
    {"author": "David", "emails": ["david.smith@email.com", "david@myemail.com"]}
    

Development environment

A docker compose file is provided to spin up an instance of Elastisearch with Kibana locally. Just run docker-compose up.

Kibana will run on port 5601 and Elastisearch on port 9200

Default credentials

The Elastisearch default user is elastic and the default password changeme

Caveats

If Elastisearch dies with exit code 137 you might have to give Docker more memory (check this article for more details)