Provide auditlog ETL as docker image
analyticsETLAuditLog/docker now produces a runnable docker image
sbt analyticsETLAuditLog/dockerBuildAndPush also publishes to the
docker repository
Feature: Issue 10185
Change-Id: Iea4bf41d096ba435d96afadb23d62ef187aa1ad0
diff --git a/README.md b/README.md
index a76cf23..0d723b3 100644
--- a/README.md
+++ b/README.md
@@ -112,26 +112,6 @@
The build and distribution override the `latest` image tag too
Remember to create an annotated tag for a release. The tag is used to define the docker image tag too
-### Caveats
-* If you want to run the git commits ETL job from within docker you need to make elasticsearch and gerrit available to it.
- You can do this by:
-
- * spinning the container within the same network used by your elasticsearch container (`analytics-etl_ek` if you used the docker-compose provided by this repo)
- * provide routing to the docker host machine (via `--add-host="gerrit:<your_host_ip_address>"`)
-
- For example:
-
- ```bash
- HOST_IP=`ifconfig en0 | grep "inet " | awk '{print $2}'` \
- docker run -ti --rm \
- --add-host="gerrit:$HOST_IP" \
- --network analytics-etl_ek \
- -e ES_HOST="elasticsearch" \
- -e GERRIT_URL="http://$HOST_IP:8080" \
- -e ANALYTICS_ARGS="--since 2000-06-01 --aggregate email_hour --writeNotProcessedEventsTo file:///tmp/failed-events -e gerrit" \
- gerritforge/gerrit-analytics-etl-gitcommits:latest
- ```
-
## Audit Logs
Extract, aggregate and persist auditLog entries produced by Gerrit via the [audit-sl4j](https://gerrit.googlesource.com/plugins/audit-sl4j/) plugin.
@@ -161,6 +141,17 @@
--until 2020-12-01
```
+You can also run this job in docker:
+
+```bash
+docker run \
+ --volume <source>/audit_log:/app/events/audit_log -ti --rm \
+ -e ES_HOST="<elasticsearch_url>" \
+ -e GERRIT_URL="http://<gerrit_url>:<gerrit_port>" \
+ -e ANALYTICS_ARGS="--elasticSearchIndex gerrit --eventsPath /app/events/audit_log --ignoreSSLCert false --since 2000-06-01 --until 2020-12-01 -a hour" \
+ gerritforge/gerrit-analytics-etl-auditlog:latest
+```
+
## Parameters
* -u, --gerritUrl - gerrit server URL (Required)
@@ -182,7 +173,16 @@
#### Docker
-Not yet available
+To build the *gerritforge/gerrit-analytics-etl-auditlog* docker image just run:
+
+`sbt analyticsETLAuditLog/docker`.
+
+If you want to distribute it use:
+
+`sbt analyticsETLAuditLog/dockerBuildAndPush`.
+
+The build and distribution override the `latest` image tag too.
+
# Development environment
@@ -191,6 +191,38 @@
## Caveats
+* If you want to run the git ETL job from within docker against containerized elasticsearch and/or gerrit instances, you need
+ to make them reachable by the ETL container. You can do this by spinning the ETL within the same network used by your elasticsearch/gerrit container (use `--network` argument)
+
+* If elasticsearch or gerrit run on your host machine, then you need to make _that_ reachable by the ETL container.
+ You can do this by providing routing to the docker host machine (i.e. `--add-host="gerrit:<your_host_ip_address>"` `--add-host="elasticsearch:<your_host_ip_address>"`)
+
+ For example:
+
+ * Run gitcommits ETL:
+ ```bash
+ HOST_IP=`ifconfig en0 | grep "inet " | awk '{print $2}'` \
+ docker run -ti --rm \
+ --add-host="gerrit:$HOST_IP" \
+ --network analytics-etl_ek \
+ -e ES_HOST="elasticsearch" \
+ -e GERRIT_URL="http://$HOST_IP:8080" \
+ -e ANALYTICS_ARGS="--since 2000-06-01 --aggregate email_hour --writeNotProcessedEventsTo file:///tmp/failed-events -e gerrit" \
+ gerritforge/gerrit-analytics-etl-gitcommits:latest
+ ```
+
+ * Run auditlog ETL:
+ ```bash
+ HOST_IP=`ifconfig en0 | grep "inet " | awk '{print $2}'` \
+ docker run -ti --rm --volume <source>/audit_log:/app/events/audit_log \
+ --add-host="gerrit:$HOST_IP" \
+ --network analytics-wizard_ek \
+ -e ES_HOST="elasticsearch" \
+ -e GERRIT_URL="http://$HOST_IP:8181" \
+ -e ANALYTICS_ARGS="--elasticSearchIndex gerrit --eventsPath /app/events/audit_log --ignoreSSLCert true --since 2000-06-01 --until 2020-12-01 -a hour" \
+ gerritforge/gerrit-analytics-etl-auditlog:latest
+ ```
+
* If Elastisearch dies with `exit code 137` you might have to give Docker more memory ([check this article for more details](https://github.com/moby/moby/issues/22211))
* Should ElasticSearch need authentication (i.e.: if X-Pack is enabled), credentials can be passed through the *spark.es.net.http.auth.pass* and *spark.es.net.http.auth.user* parameters.
diff --git a/auditlog/scripts/gerrit-analytics-etl-auditlog.sh b/auditlog/scripts/gerrit-analytics-etl-auditlog.sh
new file mode 100755
index 0000000..6172937
--- /dev/null
+++ b/auditlog/scripts/gerrit-analytics-etl-auditlog.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+set -o errexit
+
+# Required
+test -z "$ES_HOST" && ( echo "ES_HOST is not set; exiting" ; exit 1 )
+test -z "$ANALYTICS_ARGS" && ( echo "ANALYTICS_ARGS is not set; exiting" ; exit 1 )
+test -z "$GERRIT_URL" && ( echo "GERRIT_URL is not set; exiting" ; exit 1 )
+
+# Optional
+ES_PORT="${ES_PORT:-9200}"
+SPARK_JAR_PATH="${SPARK_JAR_PATH:-/app/analytics-etl-auditlog-assembly.jar}"
+SPARK_JAR_CLASS="${SPARK_JAR_CLASS:-com.gerritforge.analytics.auditlog.job.Main}"
+
+echo "* Elastic Search Host: $ES_HOST:$ES_PORT"
+echo "* Gerrit URL: $GERRIT_URL"
+echo "* Analytics arguments: $ANALYTICS_ARGS"
+echo "* Spark jar class: $SPARK_JAR_CLASS"
+echo "* Spark jar path: $SPARK_JAR_PATH"
+
+$(dirname $0)/wait-for-elasticsearch.sh ${ES_HOST} ${ES_PORT}
+
+echo "Elasticsearch is up, now running spark job..."
+
+spark-submit \
+ --conf spark.es.nodes="$ES_HOST" \
+ --class ${SPARK_JAR_CLASS} ${SPARK_JAR_PATH} \
+ --gerritUrl ${GERRIT_URL} \
+ ${ANALYTICS_ARGS}
\ No newline at end of file
diff --git a/auditlog/scripts/wait-for-elasticsearch.sh b/auditlog/scripts/wait-for-elasticsearch.sh
new file mode 100755
index 0000000..538498d
--- /dev/null
+++ b/auditlog/scripts/wait-for-elasticsearch.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+wait_for() {
+
+ ELASTIC_SEARCH_HOST=$1
+ ELASTIC_SEARCH_PORT=$2
+
+ ELASTIC_SEARCH_URL="http://$ELASTIC_SEARCH_HOST:$ELASTIC_SEARCH_PORT"
+
+ for i in `seq 30` ; do
+ curl -f ${ELASTIC_SEARCH_URL}/_cluster/health > /dev/null 2>&1
+
+ result=$?
+ if [ $result -eq 0 ] ; then
+ exit 0
+ fi
+ echo "* Waiting for Elasticsearch at $ELASTIC_SEARCH_URL ($i/30)"
+ sleep 2
+ done
+ echo "Operation timed out" >&2
+ exit 1
+}
+
+wait_for "$@"
diff --git a/build.sbt b/build.sbt
index 1eace17..838b788 100644
--- a/build.sbt
+++ b/build.sbt
@@ -33,8 +33,11 @@
dockerfile in docker := {
val artifact: File = assembly.value
val entryPointBase = s"/app"
-
baseDockerfile(projectName="auditlog", artifact, artifactTargetPath=s"$entryPointBase/${name.value}-assembly.jar")
+ .copy(baseDirectory(_ / "scripts" / "gerrit-analytics-etl-auditlog.sh").value, file(s"$entryPointBase/gerrit-analytics-etl-auditlog.sh"))
+ .copy(baseDirectory(_ / "scripts" / "wait-for-elasticsearch.sh").value, file(s"$entryPointBase/wait-for-elasticsearch.sh"))
+ .volume(s"$entryPointBase/events/")
+ .cmd(s"/bin/sh", s"$entryPointBase/gerrit-analytics-etl-auditlog.sh")
}
)
.dependsOn(common % "compile->compile;test->test")