Provide auditlog ETL as docker image

analyticsETLAuditLog/docker now produces a runnable docker image
sbt analyticsETLAuditLog/dockerBuildAndPush also publishes to the
docker repository

Feature: Issue 10185
Change-Id: Iea4bf41d096ba435d96afadb23d62ef187aa1ad0
diff --git a/README.md b/README.md
index a76cf23..0d723b3 100644
--- a/README.md
+++ b/README.md
@@ -112,26 +112,6 @@
 The build and distribution override the `latest` image tag too
 Remember to create an annotated tag for a release. The tag is used to define the docker image tag too
 
-### Caveats
-* If you want to run the git commits ETL job from within docker you need to make elasticsearch and gerrit available to it.
-  You can do this by:
-
-    * spinning the container within the same network used by your elasticsearch container (`analytics-etl_ek` if you used the docker-compose provided by this repo)
-    * provide routing to the docker host machine (via `--add-host="gerrit:<your_host_ip_address>"`)
-
-  For example:
-
-  ```bash
-  HOST_IP=`ifconfig en0 | grep "inet " | awk '{print $2}'` \
-      docker run -ti --rm \
-           --add-host="gerrit:$HOST_IP" \
-          --network analytics-etl_ek \
-          -e ES_HOST="elasticsearch" \
-          -e GERRIT_URL="http://$HOST_IP:8080" \
-          -e ANALYTICS_ARGS="--since 2000-06-01 --aggregate email_hour --writeNotProcessedEventsTo file:///tmp/failed-events -e gerrit" \
-          gerritforge/gerrit-analytics-etl-gitcommits:latest
-  ```
-
 ## Audit Logs
 
 Extract, aggregate and persist auditLog entries produced by Gerrit via the [audit-sl4j](https://gerrit.googlesource.com/plugins/audit-sl4j/) plugin.
@@ -161,6 +141,17 @@
         --until 2020-12-01
 ```
 
+You can also run this job in docker:
+
+```bash
+docker run \
+    --volume <source>/audit_log:/app/events/audit_log -ti --rm \
+    -e ES_HOST="<elasticsearch_url>" \
+    -e GERRIT_URL="http://<gerrit_url>:<gerrit_port>" \
+    -e ANALYTICS_ARGS="--elasticSearchIndex gerrit --eventsPath /app/events/audit_log --ignoreSSLCert false --since 2000-06-01 --until 2020-12-01 -a hour" \
+    gerritforge/gerrit-analytics-etl-auditlog:latest
+```
+
 ## Parameters
 
 * -u, --gerritUrl             - gerrit server URL (Required)
@@ -182,7 +173,16 @@
 
 #### Docker
 
-Not yet available
+To build the *gerritforge/gerrit-analytics-etl-auditlog* docker image just run:
+
+`sbt analyticsETLAuditLog/docker`.
+
+If you want to distribute it use:
+
+`sbt analyticsETLAuditLog/dockerBuildAndPush`.
+
+The build and distribution override the `latest` image tag too.
+
 
 # Development environment
 
@@ -191,6 +191,38 @@
 
 ## Caveats
 
+* If you want to run the git ETL job from within docker against containerized elasticsearch and/or gerrit instances, you need
+  to make them reachable by the ETL container. You can do this by spinning the ETL within the same network used by your elasticsearch/gerrit container (use `--network` argument)
+
+* If elasticsearch or gerrit run on your host machine, then you need to make _that_ reachable by the ETL container.
+  You can do this by providing routing to the docker host machine (i.e. `--add-host="gerrit:<your_host_ip_address>"` `--add-host="elasticsearch:<your_host_ip_address>"`)
+
+  For example:
+
+  * Run gitcommits ETL:
+  ```bash
+  HOST_IP=`ifconfig en0 | grep "inet " | awk '{print $2}'` \
+      docker run -ti --rm \
+          --add-host="gerrit:$HOST_IP" \
+          --network analytics-etl_ek \
+          -e ES_HOST="elasticsearch" \
+          -e GERRIT_URL="http://$HOST_IP:8080" \
+          -e ANALYTICS_ARGS="--since 2000-06-01 --aggregate email_hour --writeNotProcessedEventsTo file:///tmp/failed-events -e gerrit" \
+          gerritforge/gerrit-analytics-etl-gitcommits:latest
+  ```
+
+  * Run auditlog ETL:
+    ```bash
+    HOST_IP=`ifconfig en0 | grep "inet " | awk '{print $2}'` \
+        docker run -ti --rm --volume <source>/audit_log:/app/events/audit_log \
+        --add-host="gerrit:$HOST_IP" \
+        --network analytics-wizard_ek \
+        -e ES_HOST="elasticsearch" \
+        -e GERRIT_URL="http://$HOST_IP:8181" \
+        -e ANALYTICS_ARGS="--elasticSearchIndex gerrit --eventsPath /app/events/audit_log --ignoreSSLCert true --since 2000-06-01 --until 2020-12-01 -a hour" \
+        gerritforge/gerrit-analytics-etl-auditlog:latest
+    ```
+
 * If Elastisearch dies with `exit code 137` you might have to give Docker more memory ([check this article for more details](https://github.com/moby/moby/issues/22211))
 
 * Should ElasticSearch need authentication (i.e.: if X-Pack is enabled), credentials can be passed through the *spark.es.net.http.auth.pass* and *spark.es.net.http.auth.user* parameters.
diff --git a/auditlog/scripts/gerrit-analytics-etl-auditlog.sh b/auditlog/scripts/gerrit-analytics-etl-auditlog.sh
new file mode 100755
index 0000000..6172937
--- /dev/null
+++ b/auditlog/scripts/gerrit-analytics-etl-auditlog.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+set -o errexit
+
+# Required
+test -z "$ES_HOST" && ( echo "ES_HOST is not set; exiting" ; exit 1 )
+test -z "$ANALYTICS_ARGS" && ( echo "ANALYTICS_ARGS is not set; exiting" ; exit 1 )
+test -z "$GERRIT_URL" && ( echo "GERRIT_URL is not set; exiting" ; exit 1 )
+
+# Optional
+ES_PORT="${ES_PORT:-9200}"
+SPARK_JAR_PATH="${SPARK_JAR_PATH:-/app/analytics-etl-auditlog-assembly.jar}"
+SPARK_JAR_CLASS="${SPARK_JAR_CLASS:-com.gerritforge.analytics.auditlog.job.Main}"
+
+echo "* Elastic Search Host: $ES_HOST:$ES_PORT"
+echo "* Gerrit URL: $GERRIT_URL"
+echo "* Analytics arguments: $ANALYTICS_ARGS"
+echo "* Spark jar class: $SPARK_JAR_CLASS"
+echo "* Spark jar path: $SPARK_JAR_PATH"
+
+$(dirname $0)/wait-for-elasticsearch.sh ${ES_HOST} ${ES_PORT}
+
+echo "Elasticsearch is up, now running spark job..."
+
+spark-submit \
+    --conf spark.es.nodes="$ES_HOST" \
+    --class ${SPARK_JAR_CLASS} ${SPARK_JAR_PATH} \
+    --gerritUrl ${GERRIT_URL} \
+    ${ANALYTICS_ARGS}
\ No newline at end of file
diff --git a/auditlog/scripts/wait-for-elasticsearch.sh b/auditlog/scripts/wait-for-elasticsearch.sh
new file mode 100755
index 0000000..538498d
--- /dev/null
+++ b/auditlog/scripts/wait-for-elasticsearch.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+wait_for() {
+
+  ELASTIC_SEARCH_HOST=$1
+  ELASTIC_SEARCH_PORT=$2
+
+  ELASTIC_SEARCH_URL="http://$ELASTIC_SEARCH_HOST:$ELASTIC_SEARCH_PORT"
+
+  for i in `seq 30` ; do
+    curl -f ${ELASTIC_SEARCH_URL}/_cluster/health > /dev/null 2>&1
+
+    result=$?
+    if [ $result -eq 0 ] ; then
+          exit 0
+    fi
+    echo "* Waiting for Elasticsearch at $ELASTIC_SEARCH_URL ($i/30)"
+    sleep 2
+  done
+  echo "Operation timed out" >&2
+  exit 1
+}
+
+wait_for "$@"
diff --git a/build.sbt b/build.sbt
index 1eace17..838b788 100644
--- a/build.sbt
+++ b/build.sbt
@@ -33,8 +33,11 @@
     dockerfile in docker := {
       val artifact: File = assembly.value
       val entryPointBase = s"/app"
-
       baseDockerfile(projectName="auditlog", artifact, artifactTargetPath=s"$entryPointBase/${name.value}-assembly.jar")
+        .copy(baseDirectory(_ / "scripts" / "gerrit-analytics-etl-auditlog.sh").value, file(s"$entryPointBase/gerrit-analytics-etl-auditlog.sh"))
+        .copy(baseDirectory(_ / "scripts" / "wait-for-elasticsearch.sh").value, file(s"$entryPointBase/wait-for-elasticsearch.sh"))
+        .volume(s"$entryPointBase/events/")
+        .cmd(s"/bin/sh", s"$entryPointBase/gerrit-analytics-etl-auditlog.sh")
     }
   )
   .dependsOn(common % "compile->compile;test->test")