Merge branch 'stable-2.16'
* stable-2.16:
Format help output
Add a TODO for further design discussions
Add section on split brain and multi RW masters
HTTPS using haproxy with SSL termination
Setup gerrit local multi-site environment
bazlets: Stop using native.git_repository
Change-Id: Idb6db6902f9da88bd274e6162af563926d892d66
diff --git a/DESIGN.md b/DESIGN.md
index 7d595ee..b466ba6 100644
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -115,6 +115,29 @@
of their underlying storage, NFS and JGit implementation that allows concurrent
locking at filesystem level.
+## TODO: Synchronous replication
+Consider also synchronous replication for the cases like 5, 6, 7... in which
+case a write operation is only accepted if it is synchronously replicated to the
+other master node. This would be a 100% loss-less disaster recovery support. Without
+synchronous replication, when the RW master crashes and loses data, there could
+be no way to recover missed replications without involving users who pushed the commits
+in the first place to push them again. Further, with the synchronous replication
+the RW site has to "degrade" to RO mode when the other node is not reachable and
+synchronous replications are not possible.
+
+We have to re-evaluate the useability of the replication plugin for supporting
+the synchronous replication. For example, the replicationDelay doesn't make much
+sense in the synchronous case. Further, the rescheduling of a replication due
+to an in-flight push to the same remote URI also doesn't make much sense as we
+want the replication to happen immediately. Further, if the ref-update of the
+incoming push request has to be blocked until the synchronous replication
+finishes, the replication plugin cannot even start a replication as there is no
+a ref-updated event yet. We may consider implementing the synchronous
+replication on a lower level. For example have an "pack-received" event and
+then simply forward that pack file to the other site. Similarly for the
+ref-updated events, instead of a real git push, we could just forward the
+ref-updates to the other site.
+
## History and maturity level of the multi-site plugin
This plugin is coming from the excellent work on the high-availability plugin,
@@ -330,6 +353,109 @@
are explicitly not handled at the moment, and they are just logged as errors.
There is no retry mechanism to handle temporary failures.
+### Avoiding Split Brain
+
+The current solution of multi-site at stage #7 with asynchronous replication is
+exposed to the risk of the system reaching a Split - Brain situation (see
+[issue #10554](https://bugs.chromium.org/p/gerrit/issues/detail?id=10554).
+
+The diagram below shows happy path with a crash recovery situation bringing to a
+healthy system.
+
+
+
+In this case we are considering two different clients each doing a `push` on top of
+the same reference. This could be a new commit in a branch or the change of an existing commit.
+
+At `t0`: both clients are seeing the status of `HEAD` being `W0`. `Instance1` is the
+RW node and will receive any `push` request. `Instance1` and `Instance2` are in sync
+at `W0`.
+
+At `t1`: `Client1` pushes `W1`. The request is served by `Instance1` that acknowledges it
+and starts the replication process (with some delay).
+
+At `t2`: The replication operation is completed. Both instances are in a consistent state
+`W0 -> W1`. `Client1` shares that state but `Client2` is still behind
+
+At `t3`: `Instance1` crashes
+
+At `t4`: `Client2` pushes `W2` that is still based on `W0` (`W0 -> W2`).
+The request is served by `Instance2` that detects that the client push operation was based
+on an out-of-date starting state for the ref. The operation is refused. `Client2` synchronise its local
+state (e.g. rebases its commit) and pushes `W0 -> W1 -> W2`.
+That operation is now is now considered valid, acknowledged and put in the replication queue until
+`Instance1` will become available.
+
+At `t5`: `Instance1` restarts and gets replicated at `W0 -> W1 -> W2`
+
+The Split Brain situation is shown in the following diagram.
+
+
+
+In this case the steps are very similar but `Instance1` fails after acknowledging the
+push of `W0 -> W1` but before having replicated the status to `Instance2`.
+
+When in `t4` `Client2` pushes `W0 -> W2` to `Instance2`, this is considered a valid operation.
+It gets acknowledged and inserted in the replication queue.
+
+At `t5` `Instance1` restarts. At this point both instances have pending replication
+operations. They are executed in parallel and they bring the system to divergence.
+
+The problem is caused by the fact that:
+- the RW node acknowledges a `push` operation before __all__ replicas are fully in sync
+- the other instances are not able to understand that they are out of sync
+
+The two problems above could be solved using different approaches:
+
+- _Synchronous replication_. In this case the system would behave essentially as the
+_happy path_ diagram show above and would solve the problem operating on the first of the causes,
+at the expense of performance, availability and scalability. It is a viable and simple solution
+for two nodes set up with an infrastructure allowing fast replication.
+
+- _Centralise the information about the latest status of mutable refs_. This will operate
+on the second cause, i.e. allowing instances to realise that _they are not in sync on a particular ref_
+and refuse any write operation on that ref. The system could operate normally on any other ref and also
+will have no limitation in other functions such as Serving the GUI, supporting reads, accepting new
+changes or patch-sets on existing changes. This option is discussed in further detail below.
+
+It is important to notice that the two options are not exclusive.
+
+#### Introducing a `DfsRefDatabase`
+
+A possible implementation of the out-of-sync detection logic is based on a central
+coordinator holding the _last known status_ of a _mutable ref_ (immutable refs won't
+have to be stored here). This would be essentially a DFS base `RefDatabase` or `DfsRefDatabase`.
+
+This component:
+
+- Will contain a subset of the local `RefDatabase` data:
+ - would store only _mutable _ `refs`
+ - will keep only the most recent `sha` for each specific `ref`
+- Needs to be able to perform atomic _Compare and Set_ operations on a
+key -> value storage, for example it could be implemented using `Zookeeper` (one implementation
+was done by Dave Borowitz some years ago)
+
+The interaction diagram in this case is shown below:
+
+
+
+What changes in respect to the split brain use case is that now, whenever a change of a
+_mutable ref_ is requested, the gerrit server verifies with the central RefDB that its
+status __for this ref__ is consistent with the latest cluster status. If that is true
+the operation succeeds. The ref status is atomically compared and set to the new status
+to prevent race conditions.
+
+We can see that in this case `Instance2` enters a Read Only mode for the specific branch
+until the replication from `Instance1` is completed successfully. At this point write
+operations on the reference can be recovered.
+If `Client2` can perform the `push` again vs `Instance2`, the server would recognise that
+the client status needs update, the client will `rebase` and `push` the correct status.
+
+__NOTE__:
+This implementation will prevent the cluster to enter split brain but might bring a
+set of refs in Read Only state across all the cluster if the RW node is failing after having
+sent the request to the Ref-DB but before persisting this request into its `git` layer.
+
# Next steps in the road-map
## Step-1: fill the gaps of multi-site stage #7:
diff --git a/bazlets.bzl b/bazlets.bzl
index f97b72c..f089af4 100644
--- a/bazlets.bzl
+++ b/bazlets.bzl
@@ -1,10 +1,12 @@
+load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
+
NAME = "com_googlesource_gerrit_bazlets"
def load_bazlets(
commit,
local_path = None):
if not local_path:
- native.git_repository(
+ git_repository(
name = NAME,
remote = "https://gerrit.googlesource.com/bazlets",
commit = commit,
diff --git a/setup_local_env/README.md b/setup_local_env/README.md
new file mode 100644
index 0000000..2df0c2b
--- /dev/null
+++ b/setup_local_env/README.md
@@ -0,0 +1,57 @@
+# Local environment setup
+
+This script configures a full environment to simulate a Gerrit Multi-Site setup.
+The environment is composed by:
+ - 2 gerrit instances deployed by default in /tmp
+ - 1 kafka node and 1 zookeeper node
+ - 1 HA-PROXY
+
+## Requirements
+ - java
+ - docker and docker-compose
+ - wget
+ - envsubst
+ - haproxy
+
+## Examples
+Simplest setup with all default values and cleanup previous deployment
+```bash
+sh setup_local_env/setup.sh --release-war-file /path/to/release.war --multisite-plugin-file /path/to/multi-site.jar
+```
+Cleanup the previous deployments
+```bash
+sh setup_local_env/setup.sh --just-cleanup-env true
+```
+Help
+```bash
+Usage: sh setup.sh [--option ]
+
+[--release-war-file] Location to release.war file
+[--multisite-plugin-file] Location to plugin multi-site.jar file
+
+[--new-deployment] Cleans up previous gerrit deployment and re-installs it. default true
+[--get-websession-plugin] Download websession-flatfile plugin from CI lastSuccessfulBuild; default true
+[--deployment-location] Base location for the test deployment; default /tmp
+
+[--gerrit-canonical-host] The default host for Gerrit to be accessed through; default localhost
+[--gerrit-canonical-port] The default port for Gerrit to be accessed throug; default 8080
+
+[--gerrit-ssh-advertised-port] Gerrit Instance 1 sshd port; default 29418
+
+[--gerrit1-httpd-port] Gerrit Instance 1 http port; default 18080
+[--gerrit1-sshd-port] Gerrit Instance 1 sshd port; default 39418
+
+[--gerrit2-httpd-port] Gerrit Instance 2 http port; default 18081
+[--gerrit2-sshd-port] Gerrit Instance 2 sshd port; default 49418
+
+[--replication-type] Options [file,ssh]; default ssh
+[--replication-ssh-user] SSH user for the replication plugin; default $(whoami)
+[--just-cleanup-env] Cleans up previous deployment; default false
+
+[--enabled-https] Enabled https; default true
+```
+
+## Limitations
+ - Assumes the ssh replication is done always on port 22 on both instances
+ - When cloning projects via ssh, public keys entries are added to `known_hosts`
+ - Clean up the old entries when doing a new deploymet, otherwise just use HTTP
\ No newline at end of file
diff --git a/setup_local_env/configs/gerrit.config b/setup_local_env/configs/gerrit.config
new file mode 100644
index 0000000..8666abe
--- /dev/null
+++ b/setup_local_env/configs/gerrit.config
@@ -0,0 +1,36 @@
+[gerrit]
+ basePath = git
+ serverId = 69ec38f0-350e-4d9c-96d4-bc956f2faaac
+ canonicalWebUrl = $GERRIT_CANONICAL_WEB_URL
+[database]
+ type = h2
+ database = $LOCATION_TEST_SITE/db/ReviewDB
+[noteDb "changes"]
+ autoMigrate = true
+ disableReviewDb = true
+ primaryStorage = note db
+ read = true
+ sequence = true
+ write = true
+[container]
+ javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
+ javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
+[index]
+ type = LUCENE
+[auth]
+ type = DEVELOPMENT_BECOME_ANY_ACCOUNT
+[receive]
+ enableSignedPush = false
+[sendemail]
+ smtpServer = localhost
+[sshd]
+ listenAddress = *:$GERRIT_SSHD_PORT
+ advertisedAddress = *:$SSH_ADVERTISED_PORT
+[httpd]
+ listenUrl = proxy-$HTTP_PROTOCOL://*:$GERRIT_HTTPD_PORT/
+[cache]
+ directory = cache
+[plugins]
+ allowRemoteAdmin = true
+[plugin "websession-flatfile"]
+ directory = $FAKE_NFS
\ No newline at end of file
diff --git a/setup_local_env/configs/healthcheck.config b/setup_local_env/configs/healthcheck.config
new file mode 100644
index 0000000..69f87f7
--- /dev/null
+++ b/setup_local_env/configs/healthcheck.config
@@ -0,0 +1,9 @@
+[healthcheck]
+ timeout = 10s
+
+[healthcheck "querychanges"]
+ query = status:open OR status:merged OR status:abandoned
+ limit = 0 # there are no changes
+
+[healthcheck "auth"]
+ userame = "admin"
\ No newline at end of file
diff --git a/setup_local_env/configs/multi-site.config b/setup_local_env/configs/multi-site.config
new file mode 100644
index 0000000..f88c788
--- /dev/null
+++ b/setup_local_env/configs/multi-site.config
@@ -0,0 +1,18 @@
+[index]
+ maxTries = 50
+ retryInterval = 30000
+[kafka]
+ bootstrapServers = localhost:$KAFKA_PORT
+ securityProtocol = PLAINTEXT
+ indexEventTopic = gerrit_index
+ streamEventTopic = gerrit_stream
+ projectListEventTopic = gerrit_list_project
+ cacheEventTopic = gerrit_cache_eviction
+[kafka "subscriber"]
+ enabled = true
+ pollingIntervalMs = 1000
+ KafkaProp-enableAutoCommit = true
+ KafkaProp-autoCommitIntervalMs = 1000
+ KafkaProp-autoOffsetReset = latest
+[kafka "publisher"]
+ enabled = true
\ No newline at end of file
diff --git a/setup_local_env/configs/replication.config b/setup_local_env/configs/replication.config
new file mode 100644
index 0000000..d866228
--- /dev/null
+++ b/setup_local_env/configs/replication.config
@@ -0,0 +1,12 @@
+[remote "Replication"]
+ $REPLICATION_URL
+ push = +refs/*:refs/*
+ timeout = 600
+ rescheduleDelay = 15
+ replicationDelay = 5
+[gerrit]
+ autoReload = true
+ replicateOnStartup = true
+[replication]
+ lockErrorMaxRetries = 5
+ maxRetries = 5
\ No newline at end of file
diff --git a/setup_local_env/docker-compose.kafka-broker.yaml b/setup_local_env/docker-compose.kafka-broker.yaml
new file mode 100644
index 0000000..b7e91f0
--- /dev/null
+++ b/setup_local_env/docker-compose.kafka-broker.yaml
@@ -0,0 +1,15 @@
+version: '3'
+services:
+ zookeeper:
+ image: wurstmeister/zookeeper:latest
+ ports:
+ - "2181:2181"
+ container_name: zk_test_node
+ kafka:
+ image: wurstmeister/kafka:2.12-2.1.0
+ ports:
+ - "9092:9092"
+ container_name: kafka_test_node
+ environment:
+ KAFKA_ADVERTISED_HOST_NAME: 127.0.0.1
+ KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
diff --git a/setup_local_env/haproxy-config/haproxy.cfg b/setup_local_env/haproxy-config/haproxy.cfg
new file mode 100644
index 0000000..8afce07
--- /dev/null
+++ b/setup_local_env/haproxy-config/haproxy.cfg
@@ -0,0 +1,68 @@
+global
+ log 127.0.0.1 local0
+ log 127.0.0.1 local1 debug
+ tune.ssl.default-dh-param 2048
+ maxconn 4096
+
+defaults
+ log global
+ mode http
+ option httplog
+ option dontlognull
+ retries 3
+ option redispatch
+ maxconn 2000
+ timeout connect 5000
+ timeout client 50000
+ timeout server 50000
+
+frontend haproxynode
+ bind *:$HA_GERRIT_CANONICAL_PORT
+ $HA_HTTPS_BIND
+ mode http
+ acl redirect_reads url_reg -i git-upload-pack
+ acl redirect_reads url_reg -i clone.bundle
+ acl redirect_writes url_reg -i git-receive-pack
+ use_backend read-backendnodes if redirect_reads
+ use_backend write-backendnodes if redirect_writes
+ default_backend read-backendnodes
+
+frontend git_ssh
+ bind *:$SSH_ADVERTISED_PORT
+ option tcplog
+ mode tcp
+ timeout client 5m
+ default_backend ssh
+
+
+backend read-backendnodes
+ mode http
+ balance source
+ option forwardfor
+ http-request set-header X-Forwarded-Port %[dst_port]
+ default-server inter 10s fall 3 rise 2
+ option httpchk GET /config/server/healthcheck~status HTTP/1.0
+ http-check expect status 200
+ server node1 $HA_GERRIT_SITE1_HOSTNAME:$HA_GERRIT_SITE1_HTTPD_PORT check inter 10s
+ server node2 $HA_GERRIT_SITE2_HOSTNAME:$HA_GERRIT_SITE2_HTTPD_PORT check inter 10s
+
+backend write-backendnodes
+ mode http
+ balance roundrobin
+ option forwardfor
+ http-request set-header X-Forwarded-Port %[dst_port]
+ default-server inter 10s fall 3 rise 2
+ option httpchk GET /config/server/healthcheck~status HTTP/1.0
+ http-check expect status 200
+ server node1 $HA_GERRIT_SITE1_HOSTNAME:$HA_GERRIT_SITE1_HTTPD_PORT check inter 10s
+ server node2 $HA_GERRIT_SITE2_HOSTNAME:$HA_GERRIT_SITE2_HTTPD_PORT check inter 10s backup
+
+backend ssh
+ mode tcp
+ option redispatch
+ option httpchk GET /config/server/healthcheck~status HTTP/1.0
+ balance roundrobin
+ timeout connect 10s
+ timeout server 5m
+ server ssh_node1 $HA_GERRIT_SITE1_HOSTNAME:$HA_GERRIT_SITE1_SSHD_PORT check inter 10s check port $HA_GERRIT_SITE1_HTTPD_PORT inter 10s
+ server ssh_node2 $HA_GERRIT_SITE2_HOSTNAME:$HA_GERRIT_SITE2_SSHD_PORT check inter 10s check port $HA_GERRIT_SITE2_HTTPD_PORT inter 10s backup
\ No newline at end of file
diff --git a/setup_local_env/setup.sh b/setup_local_env/setup.sh
new file mode 100755
index 0000000..ea18c66
--- /dev/null
+++ b/setup_local_env/setup.sh
@@ -0,0 +1,400 @@
+#!/bin/bash
+
+# Copyright (C) 2019 The Android Open Source Project
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+function check_application_requirements {
+ type haproxy >/dev/null 2>&1 || { echo >&2 "Require haproxy but it's not installed. Aborting."; exit 1; }
+ type java >/dev/null 2>&1 || { echo >&2 "Require java but it's not installed. Aborting."; exit 1; }
+ type docker >/dev/null 2>&1 || { echo >&2 "Require docker but it's not installed. Aborting."; exit 1; }
+ type docker-compose >/dev/null 2>&1 || { echo >&2 "Require docker-compose but it's not installed. Aborting."; exit 1; }
+ type wget >/dev/null 2>&1 || { echo >&2 "Require wget but it's not installed. Aborting."; exit 1; }
+ type envsubst >/dev/null 2>&1 || { echo >&2 "Require envsubst but it's not installed. Aborting."; exit 1; }
+ type openssl >/dev/null 2>&1 || { echo >&2 "Require openssl but it's not installed. Aborting."; exit 1; }
+}
+
+function get_replication_url {
+ REPLICATION_LOCATION_TEST_SITE=$1
+ REPLICATION_HOSTNAME=$2
+ USER=$REPLICATION_SSH_USER
+
+ if [ "$REPLICATION_TYPE" = "file" ];then
+ echo "url = file://$REPLICATION_LOCATION_TEST_SITE/git/#{name}#.git"
+ elif [ "$REPLICATION_TYPE" = "ssh" ];then
+ echo "url = ssh://$USER@$REPLICATION_HOSTNAME:$REPLICATION_LOCATION_TEST_SITE/git/#{name}#.git"
+ fi
+}
+
+function deploy_tls_certificates {
+ echo "Deplying certificates in $HA_PROXY_CERTIFICATES_DIR..."
+ openssl req -new -newkey rsa:2048 -x509 -sha256 -days 365 -nodes \
+ -out $HA_PROXY_CERTIFICATES_DIR/MyCertificate.crt \
+ -keyout $HA_PROXY_CERTIFICATES_DIR/GerritLocalKey.key \
+ -subj "/C=GB/ST=London/L=London/O=Gerrit Org/OU=IT Department/CN=localhost"
+ cat $HA_PROXY_CERTIFICATES_DIR/MyCertificate.crt $HA_PROXY_CERTIFICATES_DIR/GerritLocalKey.key | tee $HA_PROXY_CERTIFICATES_DIR/GerritLocalKey.pem
+}
+
+function copy_config_files {
+ for file in `ls $SCRIPT_DIR/configs/*.config`
+ do
+ file_name=`basename $file`
+
+ CONFIG_TEST_SITE=$1
+ export GERRIT_HTTPD_PORT=$2
+ export LOCATION_TEST_SITE=$3
+ export GERRIT_SSHD_PORT=$4
+ export REPLICATION_HTTPD_PORT=$5
+ export REPLICATION_LOCATION_TEST_SITE=$6
+ export GERRIT_HOSTNAME=$7
+ export REPLICATION_HOSTNAME=$8
+ export REPLICATION_URL=$(get_replication_url $REPLICATION_LOCATION_TEST_SITE $REPLICATION_HOSTNAME)
+
+ echo "Replacing variables for file $file and copying to $CONFIG_TEST_SITE/$file_name"
+
+ cat $file | envsubst | sed 's/#{name}#/${name}/g' > $CONFIG_TEST_SITE/$file_name
+ done
+}
+function start_ha_proxy {
+
+ export HA_GERRIT_CANONICAL_HOSTNAME=$GERRIT_CANONICAL_HOSTNAME
+ export HA_GERRIT_CANONICAL_PORT=$GERRIT_CANONICAL_PORT
+
+ export HA_HTTPS_BIND=$HTTPS_BIND
+
+ export HA_GERRIT_SITE1_HOSTNAME=$GERRIT_1_HOSTNAME
+ export HA_GERRIT_SITE2_HOSTNAME=$GERRIT_2_HOSTNAME
+ export HA_GERRIT_SITE1_HTTPD_PORT=$GERRIT_1_HTTPD_PORT
+ export HA_GERRIT_SITE2_HTTPD_PORT=$GERRIT_2_HTTPD_PORT
+
+ export HA_GERRIT_SITE1_SSHD_PORT=$GERRIT_1_SSHD_PORT
+ export HA_GERRIT_SITE2_SSHD_PORT=$GERRIT_2_SSHD_PORT
+
+ cat $SCRIPT_DIR/haproxy-config/haproxy.cfg | envsubst > $HA_PROXY_CONFIG_DIR/haproxy.cfg
+
+ echo "Starting HA-PROXY..."
+ echo "THE SCRIPT LOCATION $SCRIPT_DIR"
+ echo "THE HA SCRIPT_LOCATION $HA_SCRIPT_DIR"
+ haproxy -f $HA_PROXY_CONFIG_DIR/haproxy.cfg &
+}
+
+function deploy_config_files {
+ # KAFKA configuration
+ export KAFKA_PORT=9092
+
+ # SITE 1
+ GERRIT_SITE1_HOSTNAME=$1
+ GERRIT_SITE1_HTTPD_PORT=$2
+ GERRIT_SITE1_SSHD_PORT=$3
+ CONFIG_TEST_SITE_1=$LOCATION_TEST_SITE_1/etc
+ # SITE 2
+ GERRIT_SITE2_HOSTNAME=$4
+ GERRIT_SITE2_HTTPD_PORT=$5
+ GERRIT_SITE2_SSHD_PORT=$6
+ CONFIG_TEST_SITE_2=$LOCATION_TEST_SITE_2/etc
+
+ # Set config SITE1
+ copy_config_files $CONFIG_TEST_SITE_1 $GERRIT_SITE1_HTTPD_PORT $LOCATION_TEST_SITE_1 $GERRIT_SITE1_SSHD_PORT $GERRIT_SITE2_HTTPD_PORT $LOCATION_TEST_SITE_2 $GERRIT_SITE1_HOSTNAME $GERRIT_SITE2_HOSTNAME
+
+
+ # Set config SITE2
+ copy_config_files $CONFIG_TEST_SITE_2 $GERRIT_SITE2_HTTPD_PORT $LOCATION_TEST_SITE_2 $GERRIT_SITE2_SSHD_PORT $GERRIT_SITE1_HTTPD_PORT $LOCATION_TEST_SITE_1 $GERRIT_SITE1_HOSTNAME $GERRIT_SITE2_HOSTNAME
+}
+
+
+function cleanup_environment {
+ echo "Killing existing HA-PROXY setup"
+ kill $(ps -ax | grep haproxy | grep "gerrit_setup/ha-proxy-config" | awk '{print $1}') 2> /dev/null
+ echo "Stoping kafka and zk"
+ docker-compose -f $SCRIPT_DIR/docker-compose.kafka-broker.yaml down 2> /dev/null
+
+ echo "Stoping GERRIT instances"
+ $1/bin/gerrit.sh stop 2> /dev/null
+ $2/bin/gerrit.sh stop 2> /dev/null
+
+ echo "REMOVING setup directory $3"
+ rm -rf $3 2> /dev/null
+}
+
+function check_if_kafka_is_running {
+ echo $(docker inspect kafka_test_node 2> /dev/null | grep '"Running": true' | wc -l)
+}
+
+while [ $# -ne 0 ]
+do
+case "$1" in
+ "--help" )
+ echo "Usage: sh $0 [--option $value]"
+ echo
+ echo "[--release-war-file] Location to release.war file"
+ echo "[--multisite-plugin-file] Location to plugin multi-site.jar file"
+ echo
+ echo "[--new-deployment] Cleans up previous gerrit deployment and re-installs it. default true"
+ echo "[--get-websession-plugin] Download websession-flatfile plugin from CI lastSuccessfulBuild; default true"
+ echo "[--deployment-location] Base location for the test deployment; default /tmp"
+ echo
+ echo "[--gerrit-canonical-host] The default host for Gerrit to be accessed through; default localhost"
+ echo "[--gerrit-canonical-port] The default port for Gerrit to be accessed throug; default 8080"
+ echo
+ echo "[--gerrit-ssh-advertised-port] Gerrit Instance 1 sshd port; default 29418"
+ echo
+ echo "[--gerrit1-httpd-port] Gerrit Instance 1 http port; default 18080"
+ echo "[--gerrit1-sshd-port] Gerrit Instance 1 sshd port; default 39418"
+ echo
+ echo "[--gerrit2-httpd-port] Gerrit Instance 2 http port; default 18081"
+ echo "[--gerrit2-sshd-port] Gerrit Instance 2 sshd port; default 49418"
+ echo
+ echo "[--replication-type] Options [file,ssh]; default ssh"
+ echo "[--replication-ssh-user] SSH user for the replication plugin; default $(whoami)"
+ echo "[--just-cleanup-env] Cleans up previous deployment; default false"
+ echo
+ echo "[--enabled-https] Enabled https; default true"
+ echo
+ exit 0
+ ;;
+ "--new-deployment")
+ NEW_INSTALLATION=$2
+ shift
+ shift
+ ;;
+ "--get-websession-plugin")
+ DOWNLOAD_WEBSESSION_FLATFILE=$2
+ shift
+ shift
+ ;;
+ "--deployment-location" )
+ DEPLOYMENT_LOCATION=$2
+ shift
+ shift
+ ;;
+ "--release-war-file" )
+ RELEASE_WAR_FILE_LOCATION=$2
+ shift
+ shift
+ ;;
+ "--multisite-plugin-file" )
+ MULTISITE_PLUGIN_LOCATION=$2
+ shift
+ shift
+ ;;
+ "--gerrit-canonical-host" )
+ export GERRIT_CANONICAL_HOSTNAME=$2
+ shift
+ shift
+ ;;
+ "--gerrit-canonical-port" )
+ export GERRIT_CANONICAL_PORT=$2
+ shift
+ shift
+ ;;
+ "--gerrit-ssh-advertised-port" )
+ export SSH_ADVERTISED_PORT=$2
+ shift
+ shift
+ ;;
+ "--gerrit1-httpd-port" )
+ GERRIT_1_HTTPD_PORT=$2
+ shift
+ shift
+ ;;
+ "--gerrit2-httpd-port" )
+ GERRIT_2_HTTPD_PORT=$2
+ shift
+ shift
+ ;;
+ "--gerrit1-sshd-port" )
+ GERRIT_1_SSHD_PORT=$2
+ shift
+ shift
+ ;;
+ "--gerrit2-sshd-port" )
+ GERRIT_2_SSHD_PORT=$2
+ shift
+ shift
+ ;;
+ "--replication-ssh-user" )
+ export REPLICATION_SSH_USER=$2
+ shift
+ shift
+ ;;
+ "--replication-type")
+ export REPLICATION_TYPE=$2
+ shift
+ shift
+ ;;
+ "--just-cleanup-env" )
+ JUST_CLEANUP_ENV=$2
+ shift
+ shift
+ ;;
+ "--enabled-https" )
+ HTTPS_ENABLED=$2
+ shift
+ shift
+ ;;
+ * )
+ echo "Unknown option argument: $1"
+ shift
+ shift
+ ;;
+esac
+done
+
+# Check application requirements
+check_application_requirements
+
+# Defaults
+NEW_INSTALLATION=${NEW_INSTALLATION:-"true"}
+DOWNLOAD_WEBSESSION_FLATFILE=${DOWNLOAD_WEBSESSION_FLATFILE:-"true"}
+DEPLOYMENT_LOCATION=${DEPLOYMENT_LOCATION:-"/tmp"}
+export GERRIT_CANONICAL_HOSTNAME=${GERRIT_CANONICAL_HOSTNAME:-"localhost"}
+export GERRIT_CANONICAL_PORT=${GERRIT_CANONICAL_PORT:-"8080"}
+GERRIT_1_HOSTNAME=${GERRIT_1_HOSTNAME:-"localhost"}
+GERRIT_2_HOSTNAME=${GERRIT_2_HOSTNAME:-"localhost"}
+GERRIT_1_HTTPD_PORT=${GERRIT_1_HTTPD_PORT:-"18080"}
+GERRIT_2_HTTPD_PORT=${GERRIT_2_HTTPD_PORT:-"18081"}
+GERRIT_1_SSHD_PORT=${GERRIT_1_SSHD_PORT:-"39418"}
+GERRIT_2_SSHD_PORT=${GERRIT_2_SSHD_PORT:-"49418"}
+REPLICATION_TYPE=${REPLICATION_TYPE:-"ssh"}
+REPLICATION_SSH_USER=${REPLICATION_SSH_USER:-$(whoami)}
+export SSH_ADVERTISED_PORT=${SSH_ADVERTISED_PORT:-"29418"}
+HTTPS_ENABLED=${HTTPS_ENABLED:-"true"}
+
+COMMON_LOCATION=$DEPLOYMENT_LOCATION/gerrit_setup
+LOCATION_TEST_SITE_1=$COMMON_LOCATION/instance-1
+LOCATION_TEST_SITE_2=$COMMON_LOCATION/instance-2
+HA_PROXY_CONFIG_DIR=$COMMON_LOCATION/ha-proxy-config
+HA_PROXY_CERTIFICATES_DIR="$HA_PROXY_CONFIG_DIR/certificates"
+
+RELEASE_WAR_FILE_LOCATION=${RELEASE_WAR_FILE_LOCATION:-bazel-bin/release.war}
+MULTISITE_PLUGIN_LOCATION=${MULTISITE_PLUGIN_LOCATION:-bazel-genfiles/plugins/multi-site/multi-site.jar}
+
+
+export FAKE_NFS=$COMMON_LOCATION/fake_nfs
+
+if [ "$JUST_CLEANUP_ENV" = "true" ];then
+ cleanup_environment $LOCATION_TEST_SITE_1 $LOCATION_TEST_SITE_2 $COMMON_LOCATION
+ exit 0
+fi
+
+if [ -z $RELEASE_WAR_FILE_LOCATION ];then
+ echo "A release.war file is required. Usage: sh $0 --release-war-file /path/to/release.war"
+ exit 1
+else
+ cp -f $RELEASE_WAR_FILE_LOCATION $DEPLOYMENT_LOCATION/gerrit.war >/dev/null 2>&1 || { echo >&2 "$RELEASE_WAR_FILE_LOCATION: Not able to copy the file. Aborting"; exit 1; }
+fi
+if [ -z $MULTISITE_PLUGIN_LOCATION ];then
+ echo "The multi-site plugin is required. Usage: sh $0 --multisite-plugin-file /path/to/multi-site.jar"
+ exit 1
+else
+ cp -f $MULTISITE_PLUGIN_LOCATION $DEPLOYMENT_LOCATION/multi-site.jar >/dev/null 2>&1 || { echo >&2 "$MULTISITE_PLUGIN_LOCATION: Not able to copy the file. Aborting"; exit 1; }
+fi
+if [ $DOWNLOAD_WEBSESSION_FLATFILE = "true" ];then
+ echo "Downloading websession-flatfile plugin stable 2.16"
+ wget https://gerrit-ci.gerritforge.com/view/Plugins-stable-2.16/job/plugin-websession-flatfile-bazel-master-stable-2.16/lastSuccessfulBuild/artifact/bazel-genfiles/plugins/websession-flatfile/websession-flatfile.jar \
+ -O $DEPLOYMENT_LOCATION/websession-flatfile.jar
+ wget https://gerrit-ci.gerritforge.com/view/Plugins-stable-2.16/job/plugin-healthcheck-bazel-stable-2.16/lastSuccessfulBuild/artifact/bazel-genfiles/plugins/healthcheck/healthcheck.jar \
+ -O $DEPLOYMENT_LOCATION/healthcheck.jar
+else
+ echo "Without the websession-flatfile; user login via haproxy will fail."
+fi
+
+if [ "$REPLICATION_TYPE" = "ssh" ];then
+ echo "Using 'SSH' replication type"
+ echo "Make sure ~/.ssh/authorized_keys and ~/.ssh/known_hosts are configured correctly"
+fi
+
+if [ "$HTTPS_ENABLED" = "true" ];then
+ export HTTP_PROTOCOL="https"
+ export GERRIT_CANONICAL_WEB_URL="$HTTP_PROTOCOL://$GERRIT_CANONICAL_HOSTNAME/"
+ export HTTPS_BIND="bind *:443 ssl crt $HA_PROXY_CONFIG_DIR/certificates/GerritLocalKey.pem"
+ HTTPS_CLONE_MSG="Using self-signed certificates, to clone via https - 'git config --global http.sslVerify false'"
+else
+ export HTTP_PROTOCOL="http"
+ export GERRIT_CANONICAL_WEB_URL="$HTTP_PROTOCOL://$GERRIT_CANONICAL_HOSTNAME:$GERRIT_CANONICAL_PORT/"
+fi
+
+# New installation
+if [ $NEW_INSTALLATION = "true" ]; then
+
+ cleanup_environment $LOCATION_TEST_SITE_1 $LOCATION_TEST_SITE_2 $COMMON_LOCATION
+
+ echo "Setting up directories"
+ mkdir -p $LOCATION_TEST_SITE_1 $LOCATION_TEST_SITE_2 $HA_PROXY_CERTIFICATES_DIR $FAKE_NFS
+ java -jar $DEPLOYMENT_LOCATION/gerrit.war init --batch --no-auto-start --install-all-plugins --dev -d $LOCATION_TEST_SITE_1
+
+ # Deploying TLS certificates
+ if [ "$HTTPS_ENABLED" = "true" ];then deploy_tls_certificates;fi
+
+ echo "Copy multi-site plugin"
+ cp -f $DEPLOYMENT_LOCATION/multi-site.jar $LOCATION_TEST_SITE_1/plugins/multi-site.jar
+
+ echo "Copy websession-flatfile plugin"
+ cp -f $DEPLOYMENT_LOCATION/websession-flatfile.jar $LOCATION_TEST_SITE_1/plugins/websession-flatfile.jar
+
+ echo "Copy healthcheck plugin"
+ cp -f $DEPLOYMENT_LOCATION/healthcheck.jar $LOCATION_TEST_SITE_1/plugins/healthcheck.jar
+
+ echo "Re-indexing"
+ java -jar $DEPLOYMENT_LOCATION/gerrit.war reindex -d $LOCATION_TEST_SITE_1
+ # Replicating environment
+ echo "Replicating environment"
+ cp -fR $LOCATION_TEST_SITE_1/* $LOCATION_TEST_SITE_2
+fi
+
+
+IS_KAFKA_RUNNING=$(check_if_kafka_is_running)
+if [ $IS_KAFKA_RUNNING -lt 1 ];then
+
+ echo "Starting zk and kafka"
+ docker-compose -f $SCRIPT_DIR/docker-compose.kafka-broker.yaml up -d
+ echo "Waiting for kafka to start..."
+ while [[ $(check_if_kafka_is_running) -lt 1 ]];do sleep 10s; done
+fi
+
+echo "Re-deploying configuration files"
+deploy_config_files $GERRIT_1_HOSTNAME $GERRIT_1_HTTPD_PORT $GERRIT_1_SSHD_PORT $GERRIT_2_HOSTNAME $GERRIT_2_HTTPD_PORT $GERRIT_2_SSHD_PORT
+echo "Starting gerrit site 1"
+$LOCATION_TEST_SITE_1/bin/gerrit.sh restart
+echo "Starting gerrit site 2"
+$LOCATION_TEST_SITE_2/bin/gerrit.sh restart
+
+
+if [[ $(ps -ax | grep haproxy | grep "gerrit_setup/ha-proxy-config" | awk '{print $1}' | wc -l) -lt 1 ]];then
+ echo "Starting haproxy"
+ start_ha_proxy
+fi
+
+echo "==============================="
+echo "Current gerrit multi-site setup"
+echo "==============================="
+echo "The admin password is 'secret'"
+echo "deployment-location=$DEPLOYMENT_LOCATION"
+echo "replication-type=$REPLICATION_TYPE"
+echo "replication-ssh-user=$REPLICATION_SSH_USER"
+echo "enable-https=$HTTPS_ENABLED"
+echo
+echo "GERRIT HA-PROXY: $GERRIT_CANONICAL_WEB_URL"
+echo "GERRIT-1: http://$GERRIT_1_HOSTNAME:$GERRIT_1_HTTPD_PORT"
+echo "GERRIT-2: http://$GERRIT_2_HOSTNAME:$GERRIT_2_HTTPD_PORT"
+echo
+echo "Site-1: $LOCATION_TEST_SITE_1"
+echo "Site-2: $LOCATION_TEST_SITE_2"
+echo
+echo "$HTTPS_CLONE_MSG"
+echo
+
+exit $?
\ No newline at end of file
diff --git a/src/main/resources/Documentation/git-replication-healthy.png b/src/main/resources/Documentation/git-replication-healthy.png
new file mode 100644
index 0000000..0d72872
--- /dev/null
+++ b/src/main/resources/Documentation/git-replication-healthy.png
Binary files differ
diff --git a/src/main/resources/Documentation/git-replication-split-brain-detected.png b/src/main/resources/Documentation/git-replication-split-brain-detected.png
new file mode 100644
index 0000000..dba5a81
--- /dev/null
+++ b/src/main/resources/Documentation/git-replication-split-brain-detected.png
Binary files differ
diff --git a/src/main/resources/Documentation/git-replication-split-brain.png b/src/main/resources/Documentation/git-replication-split-brain.png
new file mode 100644
index 0000000..30f303e
--- /dev/null
+++ b/src/main/resources/Documentation/git-replication-split-brain.png
Binary files differ