commit	a7535f95d843fcf17458d85144fdecf37edad944	[log] [tgz]
author	Fabio Ponciroli <ponch78@gmail.com>	Thu Dec 14 12:19:31 2017 +0100
committer	Fabio Ponciroli <ponch78@gmail.com>	Mon Dec 18 10:55:51 2017 +0100
tree	4e50914ebccfeaf29879b6a892aebef40ce437a0
parent	6a3a96f54615ee505090c3ef772a5f9bd511faf3 [diff]

tree: 4e50914ebccfeaf29879b6a892aebef40ce437a0

README.md

spark-gerrit-analytics-etl

Spark ETL to extra analytics data from Gerrit Projects.

Job can be launched with the following parameters:

bin/spark-submit \
    --conf spark.es.nodes=es.mycompany.com \
    --conf spark.es.net.http.auth.user=elastic \
    --conf spark.es.net.http.auth.pass=changeme \
    $JARS/SparkAnalytics-assembly-1.0.jar \
    --since 2000-06-01 \
    --aggregate email_hour \
    --url http://gerrit.mycompany.com \
    -e gerrit/analytics

Parameters

since, until, aggregate are the same defined in Gerrit Analytics plugin see: https://gerrit.googlesource.com/plugins/analytics/+/master/README.md
-u --url Gerrit server URL with the analytics plugins installed
-e --elasticIndex specify as / to be loaded in Elastic Search if not provided no ES export will be performed
-o --out folder location for storing the output as JSON files if not provided data is saved to /analytics- where is the system temporary directory
-a --email-aliases (optional) “emails to author alias” input data path.
CSVs with 3 columns are expected in input.
Here an example of the required files structure:
```
author,email,organization
John Smith,john@email.com,John's Company
John Smith,john@anotheremail.com,John's Company
David Smith,david.smith@email.com,Indipendent
David Smith,david@myemail.com,Indipendent
```
You can use the following command to quickly extract the list of authors and emails to create part of an input CSV file:
```
echo -e "author,email\n$(git log --pretty="%an,%ae%n%cn,%ce"|sort |uniq )" > /tmp/my_aliases.csv
```
Once you have it, you just have to add the organization column.
NOTE:
- organization will be extracted from the committer email if not specified
- author will be defaulted to the committer name if not specified

Development environment

A docker compose file is provided to spin up an instance of Elastisearch with Kibana locally. Just run docker-compose up.

Kibana will run on port 5601 and Elastisearch on port 9200

Default credentials

The Elastisearch default user is elastic and the default password changeme

Caveats

If Elastisearch dies with exit code 137 you might have to give Docker more memory (check this article for more details)