primary-replica/README.md - aws-gerrit - Git at Google

 # Gerrit Primary-Replica

 This set of Templates provide all the components to deploy a single Gerrit primary
 and a single Gerrit replica in ECS

 ## Architecture

 Five templates are provided in this example:
 * `cf-cluster`: define the ECS cluster and the networking stack
 * `cf-service-primary`: define the service stack running Gerrit primary
 * `cf-service-replica`: define the service stack running Gerrit replica
 * `cf-dns-route`: define the DNS routing for the service
 * `cf-dashboard`: define the CloudWatch dashboard for the services

 ### Data persistency

 * EBS volumes for:
   * Indexes
   * Caches
   * Data
   * Git repositories

 ### Deployment type

 * Latest Gerrit version deployed using the official [Docker image](https://hub.docker.com/r/gerritcodereview/gerrit)
 * Application deployed in ECS on a single EC2 instance

 ### Logging

 * All the logs are forwarded to AWS CloudWatch in the LogGroup with the cluster
   stack name. Please refer to the general [logging documentation](../README.md#logging)
   for further information on logging.

 ### Monitoring

 * Standard CloudWatch monitoring metrics for each component
 * Application level CloudWatch monitoring can be enabled as described [here](../Configuration.md#cloudwatch-monitoring)
 * Optionally Prometheus and Grafana stack (see [here](../monitoring/README.md))

 ## How to run it

 ### 0 - Prerequisites

 Follow the steps described in the [Prerequisites](../Prerequisites.md) section

 ### 1 - Configuration

 Please refer to the [configuration docs](../Configuration.md) to understand how to set up the
 configuration and what common configuration values are needed.
 On top of that, you might set the additional parameters, specific for this recipe.

 #### Environment

 Configuration values affecting deployment environment and cluster properties

 * `SERVICE_PRIMARY_STACK_NAME`: Optional. Name of the primary service stack. `gerrit-service-primary` by default.
 * `SERVICE_REPLICA_STACK_NAME`: Optional. Name of the replica service stack. `gerrit-service-replica` by default.
 * `DASHBOARD_STACK_NAME` : Optional. Name of the dashboard stack. `gerrit-dashboard` by default.
 * `HTTP_PRIMARY_SUBDOMAIN`: Optional. Name of the primary sub domain for HTTP traffic. `gerrit-http-primary-demo` by default.
 * `SSH_PRIMARY_SUBDOMAIN`: Optional. Name of the primary sub domain for SSH traffic. `gerrit-ssh-primary-demo` by default.
 * `HTTP_REPLICA_SUBDOMAIN`: Optional. Name of the replica sub domain for HTTP traffic. `gerrit-http-replica-demo` by default.
 * `SSH_REPLICA_SUBDOMAIN`: Optional. Name of the replica sub domain for SSH traffic. `gerrit-ssh-replica-demo` by default.
 * `GERRIT_PRIMARY_INSTANCE_ID`: Optional. Identifier for the Gerrit primary instance.
 "gerrit-primary-replica-PRIMARY" by default.
 * `GERRIT_REPLICA_INSTANCE_ID`: Optional. Identifier for the Gerrit replica instance.
 "gerrit-primary-replica-REPLICA" by default.
 * `GERRIT_VOLUME_ID` : Optional. Id of an extisting EBS volume. If empty, a new volume
 for Gerrit data will be created
 * `GERRIT_VOLUME_SNAPSHOT_ID` : Optional. Ignored if GERRIT_VOLUME_ID is not empty. Id of
 the EBS volume snapshot used to create new EBS volume for Gerrit data.
 * `GERRIT_VOLUME_SIZE_IN_GIB`: Optional. The size of the Gerrit data volume, in GiBs. `10` by default.

 *NOTE*: if you are planning to run the monitoring stack, set the
 `PRIMARY_MAX_COUNT` value to at least 2. The resources provided by
 a single EC2 instance won't be enough for all the services that will be ran*

 * `PROMETHEUS_SUBDOMAIN`: Optional. Prometheus subdomain. For example: `<AWS_PREFIX>-prometheus`
 * `GRAFANA_SUBDOMAIN`: Optional. Grafana subdomain. For example: `<AWS_PREFIX>-grafana`

 ##### Shared filesystem for replicas

 replicas share a data via an EFS filesystem which is
 mounted under the `/var/gerrit/git` directory. This allows git data to persist
 beyond the lifespan of a single instance and to be shared so that replicas can
 scale down and up according to needs.

 * `REPLICA_FILESYSTEM_ID`: Optional. An existing EFS filesystem id to mount on replicas.

     If empty, a new EFS will be created to store git data.
     Setting this value is required when deploying a dual-primary cluster using
     existing data as well as performing blue/green deployments.
     The nested stack will be *retained* when the cluster is deleted, so that
     existing data can be used to perform blue/green deployments.

 * `REPLICA_FILESYSTEM_THROUGHPUT_MODE`: Optional. The throughput mode for the file system to be created.
 default: `bursting`. More info [here](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-efs-filesystem.html)

 * `REPLICA_FILESYSTEM_PROVISIONED_THROUGHPUT_IN_MIBPS`: Optional. Only used when `REPLICA_FILESYSTEM_THROUGHPUT_MODE` is set to `provisioned`.
 default: `256`.

 ##### Auto Scaling of replicas instances

 Gerrit replicas have the ability to scale in or out automatically to accommodate
 to the increase or decrease of traffic. The traffic might be typically coming
 from build or test jobs executed by some sort of automated build pipeline.

 Since they all [share the same git data over EFS](#shared-filesystem-for-replicas),
 replicas are immediately ready to serve traffic as soon as they come up and
 register behind the loadbalancer.

 There is a 1 to 1 relationship between replica and EC2 instances: on each EC2
 instance in the 'replica' ASG, runs one and only one replica task.
 Because of this, when specifying the capacity for replicas (minimum, desired and
 maximum), they will both configure for the capacity of tasks as well as the
 capacity of the ASG, since they always need to be in sync.

 The scaling policy adds or removes capacity as required to keep the average CPU
 Usage (of the replica service) close to the specified target value.

 Now, tasks in the provisioning state that cannot find sufficient resources on
 the existing instances will automatically trigger the capacity provider to scale
 out the replica ASG. As more EC2 instances become available, tasks in the
 provisioning state will get placed onto those instances, reducing the number of
 tasks in provisioning.

 Conversely, as the average CPU usage (of the replica service) drops under the
 specified target value, and replica tasks get removed, the capacity provider
 will reduce the number of EC2 instances too.

 Note that only EC2 instances that are not running any replica task will scale in.

 These are the available settings:

 * `REPLICA_AUTOSCALING_MIN_CAPACITY` Optional. The minimum number of tasks that
 replicas should scale in to. This is also the minimum number of EC2 instances in
 the replica ASG
 default: *1*

 * `REPLICA_AUTOSCALING_DESIRED_CAPACITY` Optional. The desired number of
 replica tasks to run. This is also the desired number of EC2 instances in the
 replica ASG.
 default: *1*

 * `REPLICA_AUTOSCALING_MAX_CAPACITY` Optional. The maximum number of tasks that
 replicas should scale out to. This is also the maximum number of EC2 instances
 in the replica ASG
 default: *2*

 * `REPLICA_AUTOSCALING_SCALE_IN_COOLDOWN` Optional. The amount of time, in
 seconds, after a scale-in activity completes before another scale-in activity
 can start
 default: *300* seconds

 * `REPLICA_AUTOSCALING_SCALE_OUT_COOLDOWN` Optional. The amount of time, in
 seconds, to wait for a previous scale-out activity to take effect
 default: *300* seconds

 * `REPLICA_AUTOSCALING_TARGET_CPU_PERCENTAGE` Optional. Aggregate CPU
 utilization target for auto-scaling. Auto-scaling will add or remove tasks in
 the replica service to be as close as possible to this value

 * `REPLICA_CAPACITY_PROVIDER_TARGET` Optional. The target capacity value for the
 capacity provider of replicas (must be > 0 and <= 100).
 default: *100*

    Setting this value to 100 means that there will be no _spare capacity_
 allocated on the replica ASG:

    If 3 replica tasks are needed, then the ASG will adjust to have exactly 3 EC2

    Setting this value to less than 100 enables spare capacity in the ASG. For
 example, if you set this value to 50 the scaling policy will adjust the EC2
 until it is exactly twice the number of instances needed to run all of the
 tasks:

    If 3 replica tasks are needed, then there ASG will adjust to 6 EC2

 * `REPLICA_CAPACITY_PROVIDER_MIN_STEP_SIZE` Optional. The minimum number of EC2
 instances for replicas that will scale in or scale out at one time (must be >= 1
 and <= 10)
 default: *1*

 * `REPLICA_CAPACITY_PROVIDER_MAX_STEP_SIZE` Optional. The maximum number of EC2
 instances for replicas that will scale in or scale out at one time (must be >= 1
 and <= 10)
 default: *1*

 ### 2 - Deploy

 * Create the cluster, services and DNS routing stacks:

 ```
 make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] create-all
 ```

 The optional `AWS_REGION` and `AWS_REFIX` allow you to define where it will be deployed and what it will be named.

 It might take several minutes to build the stack.
 You can monitor the creations of the stacks in [CloudFormation](https://console.aws.amazon.com/cloudformation/home)

 * *NOTE*: the creation of the cluster needs an EC2 key pair are useful when you need to connect
 to the EC2 instances for troubleshooting purposes. The key pair is automatically generated
 and stored in a `pem` file on the current directory.
 To use when ssh-ing into your instances as follow: `ssh -i cluster-keys.pem ec2-user@<ec2_instance_ip>`

 ### Cleaning up

 ```
 make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all
 ```

 The optional `AWS_REGION` and `AWS_REFIX` allow you to specify exactly which stack you target for deletion.

 Note that this will *not* delete:
 * Secrets stored in Secret Manager
 * SSL certificates
 * ECR repositories
 * Replica EFS stack
 * VPC and subnets (if created as part of this deployment, rather than externally
 provided)

 ### Persistent stacks

 Blue/green deployment of the primary-replica recipe requires that the blue and
 the green stacks are deployed within the same VPC.

 In order to preserve the VPC, the IGW and the subnet upon deletion of
 the blue stack, the nested network cloudformation template needs to be
 protected from deletion.

 Note that you can completely delete the stack, including explicitly retained
 resources such as the EFS Git filesystem, VPC and subnets, by issuing the more
 aggressive command:

 ```
 make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all-including-retained-stack
 ```

 Note that this will execute a prompt to confirm your choice:

 ```
 * * * * WARNING * * * * this is going to completely destroy the stack, including git data.

 Are you sure you want to continue? [y/N]
 ```

 If you want to automate this programmatically you can just pipe the `yes`
 command to the make:

 ```
 yes | make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all-including-retained-stack
 ```

 ### Access your Gerrit instances

 Get the URL of your Gerrit primary instance this way:

 ```
 aws cloudformation describe-stacks \
   --stack-name <SERVICE_PRIMARY_STACK_NAME> \
   | grep -A1 '"OutputKey": "CanonicalWebUrl"' \
   | grep OutputValue \
   | cut -d'"' -f 4
 ```

 Similarly for the replica:
 ```
 aws cloudformation describe-stacks \
   --stack-name <SERVICE_REPLICA_STACK_NAME> \
   | grep -A1 '"OutputKey": "CanonicalWebUrl"' \
   | grep OutputValue \
   | cut -d'"' -f 4
 ```

 Gerrit primary instance ports:
 * HTTP `8080`
 * SSH `29418`

 Gerrit replica instance ports:
 * HTTP `9080`
 * SSH `39418`

 ### Docker

 Refer to the [Docker](../Docker.md) section for information on how to setup docker or how to publish images
	# Gerrit Primary-Replica

	This set of Templates provide all the components to deploy a single Gerrit primary
	and a single Gerrit replica in ECS

	## Architecture

	Five templates are provided in this example:
	* `cf-cluster`: define the ECS cluster and the networking stack
	* `cf-service-primary`: define the service stack running Gerrit primary
	* `cf-service-replica`: define the service stack running Gerrit replica
	* `cf-dns-route`: define the DNS routing for the service
	* `cf-dashboard`: define the CloudWatch dashboard for the services

	### Data persistency

	* EBS volumes for:
	* Indexes
	* Caches
	* Data
	* Git repositories

	### Deployment type

	* Latest Gerrit version deployed using the official [Docker image](https://hub.docker.com/r/gerritcodereview/gerrit)
	* Application deployed in ECS on a single EC2 instance

	### Logging

	* All the logs are forwarded to AWS CloudWatch in the LogGroup with the cluster
	stack name. Please refer to the general [logging documentation](../README.md#logging)
	for further information on logging.

	### Monitoring

	* Standard CloudWatch monitoring metrics for each component
	* Application level CloudWatch monitoring can be enabled as described [here](../Configuration.md#cloudwatch-monitoring)
	* Optionally Prometheus and Grafana stack (see [here](../monitoring/README.md))

	## How to run it

	### 0 - Prerequisites

	Follow the steps described in the [Prerequisites](../Prerequisites.md) section

	### 1 - Configuration

	Please refer to the [configuration docs](../Configuration.md) to understand how to set up the
	configuration and what common configuration values are needed.
	On top of that, you might set the additional parameters, specific for this recipe.

	#### Environment

	Configuration values affecting deployment environment and cluster properties

	* `SERVICE_PRIMARY_STACK_NAME`: Optional. Name of the primary service stack. `gerrit-service-primary` by default.
	* `SERVICE_REPLICA_STACK_NAME`: Optional. Name of the replica service stack. `gerrit-service-replica` by default.
	* `DASHBOARD_STACK_NAME` : Optional. Name of the dashboard stack. `gerrit-dashboard` by default.
	* `HTTP_PRIMARY_SUBDOMAIN`: Optional. Name of the primary sub domain for HTTP traffic. `gerrit-http-primary-demo` by default.
	* `SSH_PRIMARY_SUBDOMAIN`: Optional. Name of the primary sub domain for SSH traffic. `gerrit-ssh-primary-demo` by default.
	* `HTTP_REPLICA_SUBDOMAIN`: Optional. Name of the replica sub domain for HTTP traffic. `gerrit-http-replica-demo` by default.
	* `SSH_REPLICA_SUBDOMAIN`: Optional. Name of the replica sub domain for SSH traffic. `gerrit-ssh-replica-demo` by default.
	* `GERRIT_PRIMARY_INSTANCE_ID`: Optional. Identifier for the Gerrit primary instance.
	"gerrit-primary-replica-PRIMARY" by default.
	* `GERRIT_REPLICA_INSTANCE_ID`: Optional. Identifier for the Gerrit replica instance.
	"gerrit-primary-replica-REPLICA" by default.
	* `GERRIT_VOLUME_ID` : Optional. Id of an extisting EBS volume. If empty, a new volume
	for Gerrit data will be created
	* `GERRIT_VOLUME_SNAPSHOT_ID` : Optional. Ignored if GERRIT_VOLUME_ID is not empty. Id of
	the EBS volume snapshot used to create new EBS volume for Gerrit data.
	* `GERRIT_VOLUME_SIZE_IN_GIB`: Optional. The size of the Gerrit data volume, in GiBs. `10` by default.

	NOTE: if you are planning to run the monitoring stack, set the
	`PRIMARY_MAX_COUNT` value to at least 2. The resources provided by
	a single EC2 instance won't be enough for all the services that will be ran*

	* `PROMETHEUS_SUBDOMAIN`: Optional. Prometheus subdomain. For example: `<AWS_PREFIX>-prometheus`
	* `GRAFANA_SUBDOMAIN`: Optional. Grafana subdomain. For example: `<AWS_PREFIX>-grafana`

	##### Shared filesystem for replicas

	replicas share a data via an EFS filesystem which is
	mounted under the `/var/gerrit/git` directory. This allows git data to persist
	beyond the lifespan of a single instance and to be shared so that replicas can
	scale down and up according to needs.

	* `REPLICA_FILESYSTEM_ID`: Optional. An existing EFS filesystem id to mount on replicas.

	If empty, a new EFS will be created to store git data.
	Setting this value is required when deploying a dual-primary cluster using
	existing data as well as performing blue/green deployments.
	The nested stack will be retained when the cluster is deleted, so that
	existing data can be used to perform blue/green deployments.

	* `REPLICA_FILESYSTEM_THROUGHPUT_MODE`: Optional. The throughput mode for the file system to be created.
	default: `bursting`. More info [here](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-efs-filesystem.html)

	* `REPLICA_FILESYSTEM_PROVISIONED_THROUGHPUT_IN_MIBPS`: Optional. Only used when `REPLICA_FILESYSTEM_THROUGHPUT_MODE` is set to `provisioned`.
	default: `256`.

	##### Auto Scaling of replicas instances

	Gerrit replicas have the ability to scale in or out automatically to accommodate
	to the increase or decrease of traffic. The traffic might be typically coming
	from build or test jobs executed by some sort of automated build pipeline.

	Since they all [share the same git data over EFS](#shared-filesystem-for-replicas),
	replicas are immediately ready to serve traffic as soon as they come up and
	register behind the loadbalancer.

	There is a 1 to 1 relationship between replica and EC2 instances: on each EC2
	instance in the 'replica' ASG, runs one and only one replica task.
	Because of this, when specifying the capacity for replicas (minimum, desired and
	maximum), they will both configure for the capacity of tasks as well as the
	capacity of the ASG, since they always need to be in sync.

	The scaling policy adds or removes capacity as required to keep the average CPU
	Usage (of the replica service) close to the specified target value.

	Now, tasks in the provisioning state that cannot find sufficient resources on
	the existing instances will automatically trigger the capacity provider to scale
	out the replica ASG. As more EC2 instances become available, tasks in the
	provisioning state will get placed onto those instances, reducing the number of
	tasks in provisioning.

	Conversely, as the average CPU usage (of the replica service) drops under the
	specified target value, and replica tasks get removed, the capacity provider
	will reduce the number of EC2 instances too.

	Note that only EC2 instances that are not running any replica task will scale in.

	These are the available settings:

	* `REPLICA_AUTOSCALING_MIN_CAPACITY` Optional. The minimum number of tasks that
	replicas should scale in to. This is also the minimum number of EC2 instances in
	the replica ASG
	default: 1

	* `REPLICA_AUTOSCALING_DESIRED_CAPACITY` Optional. The desired number of
	replica tasks to run. This is also the desired number of EC2 instances in the
	replica ASG.
	default: 1

	* `REPLICA_AUTOSCALING_MAX_CAPACITY` Optional. The maximum number of tasks that
	replicas should scale out to. This is also the maximum number of EC2 instances
	in the replica ASG
	default: 2

	* `REPLICA_AUTOSCALING_SCALE_IN_COOLDOWN` Optional. The amount of time, in
	seconds, after a scale-in activity completes before another scale-in activity
	can start
	default: 300 seconds

	* `REPLICA_AUTOSCALING_SCALE_OUT_COOLDOWN` Optional. The amount of time, in
	seconds, to wait for a previous scale-out activity to take effect
	default: 300 seconds

	* `REPLICA_AUTOSCALING_TARGET_CPU_PERCENTAGE` Optional. Aggregate CPU
	utilization target for auto-scaling. Auto-scaling will add or remove tasks in
	the replica service to be as close as possible to this value

	* `REPLICA_CAPACITY_PROVIDER_TARGET` Optional. The target capacity value for the
	capacity provider of replicas (must be > 0 and <= 100).
	default: 100

	Setting this value to 100 means that there will be no _spare capacity_
	allocated on the replica ASG:

	If 3 replica tasks are needed, then the ASG will adjust to have exactly 3 EC2

	Setting this value to less than 100 enables spare capacity in the ASG. For
	example, if you set this value to 50 the scaling policy will adjust the EC2
	until it is exactly twice the number of instances needed to run all of the
	tasks:

	If 3 replica tasks are needed, then there ASG will adjust to 6 EC2

	* `REPLICA_CAPACITY_PROVIDER_MIN_STEP_SIZE` Optional. The minimum number of EC2
	instances for replicas that will scale in or scale out at one time (must be >= 1
	and <= 10)
	default: 1

	* `REPLICA_CAPACITY_PROVIDER_MAX_STEP_SIZE` Optional. The maximum number of EC2
	instances for replicas that will scale in or scale out at one time (must be >= 1
	and <= 10)
	default: 1

	### 2 - Deploy

	* Create the cluster, services and DNS routing stacks:

	```
	make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] create-all
	```

	The optional `AWS_REGION` and `AWS_REFIX` allow you to define where it will be deployed and what it will be named.

	It might take several minutes to build the stack.
	You can monitor the creations of the stacks in [CloudFormation](https://console.aws.amazon.com/cloudformation/home)

	* NOTE: the creation of the cluster needs an EC2 key pair are useful when you need to connect
	to the EC2 instances for troubleshooting purposes. The key pair is automatically generated
	and stored in a `pem` file on the current directory.
	To use when ssh-ing into your instances as follow: `ssh -i cluster-keys.pem ec2-user@<ec2_instance_ip>`

	### Cleaning up

	```
	make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all
	```

	The optional `AWS_REGION` and `AWS_REFIX` allow you to specify exactly which stack you target for deletion.

	Note that this will not delete:
	* Secrets stored in Secret Manager
	* SSL certificates
	* ECR repositories
	* Replica EFS stack
	* VPC and subnets (if created as part of this deployment, rather than externally
	provided)

	### Persistent stacks

	Blue/green deployment of the primary-replica recipe requires that the blue and
	the green stacks are deployed within the same VPC.

	In order to preserve the VPC, the IGW and the subnet upon deletion of
	the blue stack, the nested network cloudformation template needs to be
	protected from deletion.

	Note that you can completely delete the stack, including explicitly retained
	resources such as the EFS Git filesystem, VPC and subnets, by issuing the more
	aggressive command:

	```
	make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all-including-retained-stack
	```

	Note that this will execute a prompt to confirm your choice:

	```
	* * * * WARNING * * * * this is going to completely destroy the stack, including git data.

	Are you sure you want to continue? [y/N]
	```

	If you want to automate this programmatically you can just pipe the `yes`
	command to the make:

	```
	yes \| make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all-including-retained-stack
	```

	### Access your Gerrit instances

	Get the URL of your Gerrit primary instance this way:

	```
	aws cloudformation describe-stacks \
	--stack-name <SERVICE_PRIMARY_STACK_NAME> \
	\| grep -A1 '"OutputKey": "CanonicalWebUrl"' \
	\| grep OutputValue \
	\| cut -d'"' -f 4
	```

	Similarly for the replica:
	```
	aws cloudformation describe-stacks \
	--stack-name <SERVICE_REPLICA_STACK_NAME> \
	\| grep -A1 '"OutputKey": "CanonicalWebUrl"' \
	\| grep OutputValue \
	\| cut -d'"' -f 4
	```

	Gerrit primary instance ports:
	* HTTP `8080`
	* SSH `29418`

	Gerrit replica instance ports:
	* HTTP `9080`
	* SSH `39418`

	### Docker

	Refer to the [Docker](../Docker.md) section for information on how to setup docker or how to publish images