Refactoring using DataFrames.

General refactoring making use of DataFrame instead of complex json4s
computations.
DataFrames allow to specify a better and more readable semantics while
transforming data.

Change-Id: Idee33f08bba110f76e68124e3264f6c5203d8534
9 files changed
tree: 10dec1491cc53ed0fff95b18dca38c126df988e1
  1. kibana/
  2. project/
  3. src/
  4. .gitignore
  5. build.sbt
  6. LICENSE
  7. README.md
README.md

spark-gerrit-analytics-etl

Spark ETL to extra analytics data from Gerrit Projects.

Job can be launched with the following parameters:

bin/spark-submit \
    --conf spark.es.nodes=es.mycompany.com \
    --conf spark.es.net.http.auth.user=elastic \
    --conf spark.es.net.http.auth.pass=changeme \
    $JARS/SparkAnalytics-assembly-1.0.jar \
    --since 2000-06-01 \
    --aggregate email_hour \
    --url http://gerrit.mycompany.com \
    -e gerrit/analytics

Parameters

  • since, until, aggregate are the same defined in Gerrit Analytics plugin see: https://gerrit.googlesource.com/plugins/analytics/+/master/README.md
  • -u --url Gerrit server URL with the analytics plugins installed
  • -e --elasticIndex specify as / to be loaded in Elastic Search if not provided no ES export will be performed
  • -o --out folder location for storing the output as JSON files if not provided data is saved to /analytics- where is the system temporary directory