commit | 91c551dcaf0f93aacd1b3da96e64939e076322fa | [log] [tgz] |
---|---|---|
author | Claudio Pacchiega <claudio.pacchiega@gmail.com> | Fri Nov 03 10:24:12 2017 +0100 |
committer | Claudio Pacchiega <claudio.pacchiega@gmail.com> | Sat Dec 02 13:03:56 2017 +0100 |
tree | 10dec1491cc53ed0fff95b18dca38c126df988e1 | |
parent | 15e1cd4ff29dee746c3a99082cf3309d3a38c4c1 [diff] |
Refactoring using DataFrames. General refactoring making use of DataFrame instead of complex json4s computations. DataFrames allow to specify a better and more readable semantics while transforming data. Change-Id: Idee33f08bba110f76e68124e3264f6c5203d8534
Spark ETL to extra analytics data from Gerrit Projects.
Job can be launched with the following parameters:
bin/spark-submit \ --conf spark.es.nodes=es.mycompany.com \ --conf spark.es.net.http.auth.user=elastic \ --conf spark.es.net.http.auth.pass=changeme \ $JARS/SparkAnalytics-assembly-1.0.jar \ --since 2000-06-01 \ --aggregate email_hour \ --url http://gerrit.mycompany.com \ -e gerrit/analytics