commit | 6a3a96f54615ee505090c3ef772a5f9bd511faf3 | [log] [tgz] |
---|---|---|
author | Fabio Ponciroli <ponch78@gmail.com> | Wed Dec 13 10:36:57 2017 +0000 |
committer | Fabio Ponciroli <ponch78@gmail.com> | Thu Dec 14 19:54:46 2017 +0100 |
tree | 6eea4fb6347991d00e5ebfc7d46872c31db891bf | |
parent | 243eb5c48719b3409b784d42e17f0c05416c833f [diff] |
Input aliases from CSV instead of JSON CSVs can be easily generated with simple git commands. Furthermore the code is simplified compared to the JSON version. Change-Id: I4332497407df40ce1e68f0de304fad45ad5f6b8e
Spark ETL to extra analytics data from Gerrit Projects.
Job can be launched with the following parameters:
bin/spark-submit \ --conf spark.es.nodes=es.mycompany.com \ --conf spark.es.net.http.auth.user=elastic \ --conf spark.es.net.http.auth.pass=changeme \ $JARS/SparkAnalytics-assembly-1.0.jar \ --since 2000-06-01 \ --aggregate email_hour \ --url http://gerrit.mycompany.com \ -e gerrit/analytics
since, until, aggregate are the same defined in Gerrit Analytics plugin see: https://gerrit.googlesource.com/plugins/analytics/+/master/README.md
-u --url Gerrit server URL with the analytics plugins installed
-e --elasticIndex specify as / to be loaded in Elastic Search if not provided no ES export will be performed
-o --out folder location for storing the output as JSON files if not provided data is saved to /analytics- where is the system temporary directory
-a --email-aliases (optional) “emails to author alias” input data path.
CSVs with 2 columns are expected in input.
Here an example of the required files structure:
author,email John Smith,john@email.com John Smith,john@anotheremail.com David Smith,david.smith@email.com David Smith,david@myemail.com
You can use the following command to quickly extract the list of authors and emails to create an input CSV file:
echo -e "author,email\n$(git log --pretty="%an,%ae%n%cn,%ce"|sort |uniq )" > /tmp/my_aliases.csv
A docker compose file is provided to spin up an instance of Elastisearch with Kibana locally. Just run docker-compose up
.
Kibana will run on port 5601
and Elastisearch on port 9200
The Elastisearch default user is elastic
and the default password changeme
If Elastisearch dies with exit code 137
you might have to give Docker more memory (check this article for more details)