This set of templates provides all the components to deploy a Gerrit dual-primary in HA in ECS. The 2 primaries will share the Git repositories via NFS, using EFS.
The following templates are provided in this example:
cf-cluster
: define the ECS cluster and the networking stackcf-service-primary
: define the service stack running the gerrit primarycf-dns-route
: define the DNS routing for the servicecf-service-replica
: define the service stack running the gerrit replicacf-service-lb
: define the LBs in front of gerrit primaries (this includes haproxy as well as NLB)cf-dashboard
: define the CloudWatch dashboard for the servicesWhen the recipe enables the replication_service (see docs) then these additional templates will be executed:
cf-service-replication
: Define a replication stack that will allow git replication over the EFS volume, which is mounted by the primary instances.NOTE: This stack uses EFS in provisioned mode, which is a better setting for large repos (> 1GB uncompressed) since it provides a lower latency compared to the burst mode. However, it has some costs associated. If you are dealing with small repos, you can switch to burst mode.
Gerrit stores information in two volumes: git data (and possibly websessions, when not using multi-site) are shared across Gerrit nodes and therefore persisted in the EFS volume, whilst cache, logs, plugins data and indexes are local to each specific Gerrit node and thus stored in the EBS volume.
In order to deploy a Gerrit instance that runs on pre-existing data, the EFS volume and an EBS snapshot need to be specified in the setup.env
file(see configuration for more information on how to do this).
Referring to persistent volumes allows to perform blue-green deployments.
When a dual-primary stack is created, unless otherwise specified, a new EFS is created and a two new empty EBSs are attached to primary1 and primary2, respectively.
In a blue/green deployment scenario, this initial stack is called the blue stack.
make AWS_REGION=us-east-1 AWS_PREFIX=gerrit-blue create-all
Later on (days, weeks, months), the need of a change arises, for which a new version of the cluster needs to be deployed: this will be the green stack and it will need to be deployed as such:
Take primary1 EBS snapshot of volume attached to /dev/xvdg (note, this needs to be done in a read only window). Ideally this step is already performed regularly by a backup script.
Update the setup.env
to point to existing volumes, for example:
PRIMARY_FILESYSTEM_ID=fs-93514727 REPLICA_FILESYSTEM_ID=fs-9c514728 GERRIT_VOLUME_SNAPSHOT_ID=snap-048a5c2dfc14a81eb
If the network stack was created as part of this deployment (i.e. a new VPC was created as part of this deployment), then you need to set network resources so that the green stack can be deployed in the same VPC, for example:
VPC_ID= vpc-03292278512e783c7 INTERNET_GATEWAY_ID=igw-0cb5b144c294f9411 SUBNET1_ID=subnet-066065ea55fda52cf SUBNET1_AZ=us-east-1a SUBNET1_CIDR=10.0.0.0/24 SUBNET2_ID=subnet-0fefe45d89ce02b31 SUBNET2_AZ=us-east-1b SUBNET2_CIDR=10.0.32.0/24
Note that if the refs-db dynamodb tables were created as part of the initial stack (CREATE_REFS_DB_TABLES
was set to true
), you will need to explicitly set it to false
to avoid attempting to create the same tables again.
make AWS_REGION=us-east-1 AWS_PREFIX=gerrit-green create-all
Once you are happy the green stack is aligned and healthy you can switch the Route53 DNS to the new green stack.
make AWS_REGION=us-east-1 AWS_PREFIX=gerrit-blue delete-all
Note that, even if the EFS resources were created as part of the blue stack, they will be retained during the stack deletion, so that they can still be used by the green stack.
This includes EFS as well as VPC resources (if they were created as part of the blue stack).
Follow the steps described in the Prerequisites section
Please refer to the configuration docs to understand how to set up the configuration and what common configuration values are needed. On top of that, you might set the additional parameters, specific for this recipe.
Configuration values affecting deployment environment and cluster properties
SERVICE_PRIMARY1_STACK_NAME
: Optional. Name of the primary 1 service stack. gerrit-service-primary-1
by default.
SERVICE_PRIMARY2_STACK_NAME
: Optional. Name of the primary 2 service stack. gerrit-service-primary-2
by default.
DASHBOARD_STACK_NAME
: Optional. Name of the dashboard stack. gerrit-dashboard
by default.
PRIMARY1_SUBDOMAIN
: Optional. Name of the primary 1 sub domain. gerrit-primary-1-demo
by default.
PRIMARY2_SUBDOMAIN
: Optional. Name of the primary 2 sub domain. gerrit-primary-2-demo
by default.
REPLICA_SUBDOMAIN
: Mandatory. The subdomain of the Gerrit replica. For example: <AWS_PREFIX>-replica
LB_SUBDOMAIN
: Mandatory. The subdomain of the Gerrit load balancer. For example: <AWS_PREFIX>-dual-primary
PRIMARIES_GERRIT_SUBDOMAIN
: Mandatory. The subdomain of the lb serving traffic to both primary gerrit instances. For example: <AWS_PREFIX>-primaries
PRIMARY_FILESYSTEM_THROUGHPUT_MODE
: Optional. The throughput mode for the primary file system to be created. default: bursting
. More info here
PRIMARY_FILESYSTEM_PROVISIONED_THROUGHPUT_IN_MIBPS
: Optional. Only used when PRIMARY_FILESYSTEM_THROUGHPUT_MODE
is set to provisioned
. default: 256
.
GERRIT_REPLICA_INSTANCE_ID
: Optional. Identifier for the Gerrit replica instance. “gerrit-dual-primary-REPLICA” by default.
GERRIT_PRIMARY1_INSTANCE_ID
: Optional. Identifier for the Gerrit primary1 instance. “gerrit-dual-primary-PRIMARY1” by default.
GERRIT_PRIMARY2_INSTANCE_ID
: Optional. Identifier for the Gerrit primary2 instance. “gerrit-dual-primary-PRIMARY2” by default.
HA_PROXY_DESIRED_COUNT
: Optional. Desired number of haproxy services. “2” by default. Minimum: “2”.
Note ha-proxies are running on ec2 instances with a ratio of 1 to 1: each ec2 node hosts one and only one ha-proxy. By increasing the number of desired ha-proxies then, the size of the autoscaling group hosting them also increases accordingly.
HA_PROXY_MAX_COUNT
: Optional. Maximum number of EC2 instances in the haproxy autoscaling group. “2” by default. Minimum: “2”.
PRIMARY_MAX_COUNT
: Optional. Maximum number of EC2 instances in the primary autoscaling group. “2” by default. Minimum: “2”.
GERRIT_VOLUME_SNAPSHOT_ID
: Optional. Id of the EBS volume snapshot used to create new EBS volume for Gerrit data. A new volume will be created for each primary, based on this snapshot.
Note that, differently from other recipes, dual-primary does not support the GERRIT_VOLUME_ID
parameter, since it wouldn't be possible to mount the same EBS on multiple EC2 instances.
GERRIT_VOLUME_SIZE_IN_GIB
: Optional. The size of the Gerrit data volume, in GiBs. 10
by default.
FILESYSTEM_ID
: Optional. An existing EFS filesystem id.
If empty, a new EFS will be created to store git data. Setting this value is required when deploying a dual-primary cluster using existing data as well as performing blue/green deployments. The nested stack will be retained when the cluster is deleted, so that existing data can be used to perform blue/green deployments.
AUTOREINDEX_POLL_INTERVAL
. Optional. Interval between reindexing of all changes, accounts and groups. Default: 10m
high-availability docs here
DYNAMODB_LOCKS_TABLE_NAME
. Optional. The name of the dynamoDB table used to store distribute locking. Default: locksTable
See DynamoDB lock client here
DYNAMODB_REFS_TABLE_NAME
. Optional. The name of the dynamoDB table used to store git refs and their associated sha1. Default: refsDb
CREATE_REFS_DB_TABLES
. Optional. Whether to create the DynamoDB refs and lock tables. Default: false
Similarly to primary nodes, replicas share a data via an EFS filesystem which is mounted under the /var/gerrit/git
directory. This allows git data to persist beyond the lifespan of a single instance and to be shared so that replicas can scale down and up according to needs.
REPLICA_FILESYSTEM_ID
: Optional. An existing EFS filesystem id to mount on replicas.
If empty, a new EFS will be created to store git data. Setting this value is required when deploying a dual-primary cluster using existing data as well as performing blue/green deployments. The nested stack will be retained when the cluster is deleted, so that existing data can be used to perform blue/green deployments.
REPLICA_FILESYSTEM_THROUGHPUT_MODE
: Optional. The throughput mode for the file system to be created. default: bursting
. More info here
REPLICA_FILESYSTEM_PROVISIONED_THROUGHPUT_IN_MIBPS
: Optional. Only used when REPLICA_FILESYSTEM_THROUGHPUT_MODE
is set to provisioned
. default: 256
.
Gerrit replicas have the ability to scale in or out automatically to accommodate to the increase or decrease of traffic. The traffic might be typically coming from build or test jobs executed by some sort of automated build pipeline.
Since they all share the same git data over EFS, replicas are immediately ready to serve traffic as soon as they come up and register behind the loadbalancer.
There is a 1 to 1 relationship between replica and EC2 instances: on each EC2 instance in the ‘replica’ ASG, runs one and only one replica task. Because of this, when specifying the capacity for replicas (minimum, desired and maximum), they will both configure for the capacity of tasks as well as the capacity of the ASG, since they always need to be in sync.
The scaling policy adds or removes capacity as required to keep the average CPU Usage (of the replica service) close to the specified target value.
Now, tasks in the provisioning state that cannot find sufficient resources on the existing instances will automatically trigger the capacity provider to scale out the replica ASG. As more EC2 instances become available, tasks in the provisioning state will get placed onto those instances, reducing the number of tasks in provisioning.
Conversely, as the average CPU usage (of the replica service) drops under the specified target value, and replica tasks get removed, the capacity provider will reduce the number of EC2 instances too.
Note that only EC2 instances that are not running any replica task will scale in.
These are the available settings:
REPLICA_AUTOSCALING_MIN_CAPACITY
Optional. The minimum number of tasks that replicas should scale in to. This is also the minimum number of EC2 instances in the replica ASG default: 1
REPLICA_AUTOSCALING_DESIRED_CAPACITY
Optional. The desired number of replica tasks to run. This is also the desired number of EC2 instances in the replica ASG. default: 1
REPLICA_AUTOSCALING_MAX_CAPACITY
Optional. The maximum number of tasks that replicas should scale out to. This is also the maximum number of EC2 instances in the replica ASG default: 2
REPLICA_AUTOSCALING_SCALE_IN_COOLDOWN
Optional. The amount of time, in seconds, after a scale-in activity completes before another scale-in activity can start default: 300 seconds
REPLICA_AUTOSCALING_SCALE_OUT_COOLDOWN
Optional. The amount of time, in seconds, to wait for a previous scale-out activity to take effect default: 300 seconds
REPLICA_AUTOSCALING_TARGET_CPU_PERCENTAGE
Optional. Aggregate CPU utilization target for auto-scaling. Auto-scaling will add or remove tasks in the replica service to be as close as possible to this value
REPLICA_CAPACITY_PROVIDER_TARGET
Optional. The target capacity value for the capacity provider of replicas (must be > 0 and <= 100). default: 100
Setting this value to 100 means that there will be no spare capacity allocated on the replica ASG:
If 3 replica tasks are needed, then the ASG will adjust to have exactly 3 EC2
Setting this value to less than 100 enables spare capacity in the ASG. For example, if you set this value to 50 the scaling policy will adjust the EC2 until it is exactly twice the number of instances needed to run all of the tasks:
If 3 replica tasks are needed, then there ASG will adjust to 6 EC2
REPLICA_CAPACITY_PROVIDER_MIN_STEP_SIZE
Optional. The minimum number of EC2 instances for replicas that will scale in or scale out at one time (must be >= 1 and <= 10) default: 1
REPLICA_CAPACITY_PROVIDER_MAX_STEP_SIZE
Optional. The maximum number of EC2 instances for replicas that will scale in or scale out at one time (must be >= 1 and <= 10) default: 1
REPLICATION_SERVICE_ENABLED
: Optional. Whether to expose a replication endpoint. “false” by default.SERVICE_REPLICATION_STACK_NAME
: Optional. The name of the replication service stack. “git-replication-service” by default.SERVICE_REPLICATION_DESIRED_COUNT
: Optional. Number of wanted replication tasks. “1” by default.GIT_REPLICATION_SUBDOMAIN
: Optional. The subdomain to use for the replication endpoint. “git-replication” by default.It is also posssible to replicate to an extra target by providing a FQDN. The target is expected to expose port 9148 and port 1022 for git and git admin operations respectively.
REMOTE_REPLICATION_TARGET_HOST
: Optional. The fully qualified domain name of a remote replication target. Empty by default.The replication service and the remote replication target represent the reading and writing sides of Git replication: by enabling both of them, it is possible to establish replication to a remote Git site.
This recipe supports multi-site. Multi-site is a specific configuration of Gerrit that allows it to be part of distributed multi-primary of multiple Gerrit clusters. No storage is shared among the Gerrit sites: syncing happens thanks to two channels:
replication
plugin allow alignment of git data (see replication service) for how to enable this.multi-site
group of plugins and resources allow the coordination and the exchange of gerrit specific events that are produced and consumed by the members of the multi-site deployment. (See the multi-site design for more information on this.These are the parameters that can be specified to enable/disable multi-site:
MULTISITE_ENABLED
: Optional. Whether this Gerrit is part of a multi-site cluster deployment. “false” by default.MULTISITE_KAFKA_BROKERS
: Required when “MULTISITE_ENABLED=true”. Comma separated list of Kafka broker hosts (host:port) to use for publishing events to the message broker.MULTISITE_GLOBAL_PROJECTS
: Optional. Comma separated list of patterns (see projects.pattern) to specify which projects are available across all sites. This parametes applies to both multi-site and replication service remote destinations. Empty by default which means that all projects are available across all sites.make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] create-all
The optional AWS_REGION
and AWS_REFIX
allow you to define where it will be deployed and what it will be named.
It might take several minutes to build the stack. You can monitor the creations of the stacks in CloudFormation
pem
file on the current directory. To use when ssh-ing into your instances as follow: ssh -i cluster-keys.pem ec2-user@<ec2_instance_ip>
Optionally this recipe can be deployed so that replication onto the share EFS volume is available.
By setting the environment variable REPLICATION_SERVICE_ENABLED=true
, this recipe will set up and configure additional resources that will allow other other sites to replicate to a specific endpoint, exposed as:
For GIT replication $(GIT_REPLICATION_SUBDOMAIN).$(HOSTED_ZONE_NAME):9148
For Git Admin replication $(GIT_REPLICATION_SUBDOMAIN).$(HOSTED_ZONE_NAME):1022
The service will persist git data on the same EFS volume mounted by the gerrit primary1 and gerrit primary2.
Note that the replication endpoint is not internet-facing, thus replication requests must be coming from a peered VPC.
make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all
The optional AWS_REGION
and AWS_REFIX
allow you to specify exactly which stack you target for deletion.
Note that this will not delete:
Note that you can completely delete the stack, including explicitly retained resources such as the EFS Git filesystem, VPC and subnets and DynamoDB stack by issuing the more aggressive command:
make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all-including-retained-stack
Note that this will execute a prompt to confirm your choice:
* * * * WARNING * * * * this is going to completely destroy the stack, including git data. Are you sure you want to continue? [y/N]
If you want to automate this programmatically you can just pipe the yes
command to the make:
yes | make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all-including-retained-stack
Get the URL of your Gerrit primary instances this way:
aws cloudformation describe-stacks \ --stack-name <SERVICE_PRIMARY1_STACK_NAME> \ | grep -A1 '"OutputKey": "CanonicalWebUrl"' \ | grep OutputValue \ | cut -d'"' -f 4 aws cloudformation describe-stacks \ --stack-name <SERVICE_PRIMARY2_STACK_NAME> \ | grep -A1 '"OutputKey": "CanonicalWebUrl"' \ | grep OutputValue \ | cut -d'"' -f 4
Gerrit primary instance ports:
8080
29418
If you need to setup some external services (maybe for testing purposes, such as SMTP or LDAP), you can follow the instructions here
Refer to the Docker section for information on how to setup docker or how to publish images