blob: 9cff49c932d766fe6eab684c01c64b9f676512a9 [file] [log] [blame] [view]
# Gerrit Primary-Replica
This set of Templates provide all the components to deploy a single Gerrit primary
and a single Gerrit replica in ECS
## Architecture
Five templates are provided in this example:
* `cf-cluster`: define the ECS cluster and the networking stack
* `cf-service-primary`: define the service stack running Gerrit primary
* `cf-service-replica`: define the service stack running Gerrit replica
* `cf-dns-route`: define the DNS routing for the service
* `cf-dashboard`: define the CloudWatch dashboard for the services
### Data persistency
* EBS volumes for:
* Indexes
* Caches
* Data
* Git repositories
### Deployment type
* Latest Gerrit version deployed using the official [Docker image](https://hub.docker.com/r/gerritcodereview/gerrit)
* Application deployed in ECS on a single EC2 instance
### Logging
* All the logs are forwarded to AWS CloudWatch in the LogGroup with the cluster
stack name. Please refer to the general [logging documentation](../README.md#logging)
for further information on logging.
### Monitoring
* Standard CloudWatch monitoring metrics for each component
* Application level CloudWatch monitoring can be enabled as described [here](../Configuration.md#cloudwatch-monitoring)
* Optionally Prometheus and Grafana stack (see [here](../monitoring/README.md))
## How to run it
### 0 - Prerequisites
Follow the steps described in the [Prerequisites](../Prerequisites.md) section
### 1 - Configuration
Please refer to the [configuration docs](../Configuration.md) to understand how to set up the
configuration and what common configuration values are needed.
On top of that, you might set the additional parameters, specific for this recipe.
#### Environment
Configuration values affecting deployment environment and cluster properties
* `SERVICE_PRIMARY_STACK_NAME`: Optional. Name of the primary service stack. `gerrit-service-primary` by default.
* `SERVICE_REPLICA_STACK_NAME`: Optional. Name of the replica service stack. `gerrit-service-replica` by default.
* `DASHBOARD_STACK_NAME` : Optional. Name of the dashboard stack. `gerrit-dashboard` by default.
* `HTTP_PRIMARY_SUBDOMAIN`: Optional. Name of the primary sub domain for HTTP traffic. `gerrit-http-primary-demo` by default.
* `SSH_PRIMARY_SUBDOMAIN`: Optional. Name of the primary sub domain for SSH traffic. `gerrit-ssh-primary-demo` by default.
* `HTTP_REPLICA_SUBDOMAIN`: Optional. Name of the replica sub domain for HTTP traffic. `gerrit-http-replica-demo` by default.
* `SSH_REPLICA_SUBDOMAIN`: Optional. Name of the replica sub domain for SSH traffic. `gerrit-ssh-replica-demo` by default.
* `GERRIT_PRIMARY_INSTANCE_ID`: Optional. Identifier for the Gerrit primary instance.
"gerrit-primary-replica-PRIMARY" by default.
* `GERRIT_REPLICA_INSTANCE_ID`: Optional. Identifier for the Gerrit replica instance.
"gerrit-primary-replica-REPLICA" by default.
* `GERRIT_VOLUME_ID` : Optional. Id of an extisting EBS volume. If empty, a new volume
for Gerrit data will be created
* `GERRIT_VOLUME_SNAPSHOT_ID` : Optional. Ignored if GERRIT_VOLUME_ID is not empty. Id of
the EBS volume snapshot used to create new EBS volume for Gerrit data.
* `GERRIT_VOLUME_SIZE_IN_GIB`: Optional. The size of the Gerrit data volume, in GiBs. `10` by default.
*NOTE*: if you are planning to run the monitoring stack, set the
`PRIMARY_MAX_COUNT` value to at least 2. The resources provided by
a single EC2 instance won't be enough for all the services that will be ran*
* `PROMETHEUS_SUBDOMAIN`: Optional. Prometheus subdomain. For example: `<AWS_PREFIX>-prometheus`
* `GRAFANA_SUBDOMAIN`: Optional. Grafana subdomain. For example: `<AWS_PREFIX>-grafana`
##### Shared filesystem for replicas
replicas share a data via an EFS filesystem which is
mounted under the `/var/gerrit/git` directory. This allows git data to persist
beyond the lifespan of a single instance and to be shared so that replicas can
scale down and up according to needs.
* `REPLICA_FILESYSTEM_ID`: Optional. An existing EFS filesystem id to mount on replicas.
If empty, a new EFS will be created to store git data.
Setting this value is required when deploying a dual-primary cluster using
existing data as well as performing blue/green deployments.
The nested stack will be *retained* when the cluster is deleted, so that
existing data can be used to perform blue/green deployments.
* `REPLICA_FILESYSTEM_THROUGHPUT_MODE`: Optional. The throughput mode for the file system to be created.
default: `bursting`. More info [here](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-efs-filesystem.html)
* `REPLICA_FILESYSTEM_PROVISIONED_THROUGHPUT_IN_MIBPS`: Optional. Only used when `REPLICA_FILESYSTEM_THROUGHPUT_MODE` is set to `provisioned`.
default: `256`.
##### Auto Scaling of replicas instances
Gerrit replicas have the ability to scale in or out automatically to accommodate
to the increase or decrease of traffic. The traffic might be typically coming
from build or test jobs executed by some sort of automated build pipeline.
Since they all [share the same git data over EFS](#shared-filesystem-for-replicas),
replicas are immediately ready to serve traffic as soon as they come up and
register behind the loadbalancer.
There is a 1 to 1 relationship between replica and EC2 instances: on each EC2
instance in the 'replica' ASG, runs one and only one replica task.
Because of this, when specifying the capacity for replicas (minimum, desired and
maximum), they will both configure for the capacity of tasks as well as the
capacity of the ASG, since they always need to be in sync.
The scaling policy adds or removes capacity as required to keep the average CPU
Usage (of the replica service) close to the specified target value.
Now, tasks in the provisioning state that cannot find sufficient resources on
the existing instances will automatically trigger the capacity provider to scale
out the replica ASG. As more EC2 instances become available, tasks in the
provisioning state will get placed onto those instances, reducing the number of
tasks in provisioning.
Conversely, as the average CPU usage (of the replica service) drops under the
specified target value, and replica tasks get removed, the capacity provider
will reduce the number of EC2 instances too.
Note that only EC2 instances that are not running any replica task will scale in.
These are the available settings:
* `REPLICA_AUTOSCALING_MIN_CAPACITY` Optional. The minimum number of tasks that
replicas should scale in to. This is also the minimum number of EC2 instances in
the replica ASG
default: *1*
* `REPLICA_AUTOSCALING_DESIRED_CAPACITY` Optional. The desired number of
replica tasks to run. This is also the desired number of EC2 instances in the
replica ASG.
default: *1*
* `REPLICA_AUTOSCALING_MAX_CAPACITY` Optional. The maximum number of tasks that
replicas should scale out to. This is also the maximum number of EC2 instances
in the replica ASG
default: *2*
* `REPLICA_AUTOSCALING_SCALE_IN_COOLDOWN` Optional. The amount of time, in
seconds, after a scale-in activity completes before another scale-in activity
can start
default: *300* seconds
* `REPLICA_AUTOSCALING_SCALE_OUT_COOLDOWN` Optional. The amount of time, in
seconds, to wait for a previous scale-out activity to take effect
default: *300* seconds
* `REPLICA_AUTOSCALING_TARGET_CPU_PERCENTAGE` Optional. Aggregate CPU
utilization target for auto-scaling. Auto-scaling will add or remove tasks in
the replica service to be as close as possible to this value
* `REPLICA_CAPACITY_PROVIDER_TARGET` Optional. The target capacity value for the
capacity provider of replicas (must be > 0 and <= 100).
default: *100*
Setting this value to 100 means that there will be no _spare capacity_
allocated on the replica ASG:
If 3 replica tasks are needed, then the ASG will adjust to have exactly 3 EC2
Setting this value to less than 100 enables spare capacity in the ASG. For
example, if you set this value to 50 the scaling policy will adjust the EC2
until it is exactly twice the number of instances needed to run all of the
tasks:
If 3 replica tasks are needed, then there ASG will adjust to 6 EC2
* `REPLICA_CAPACITY_PROVIDER_MIN_STEP_SIZE` Optional. The minimum number of EC2
instances for replicas that will scale in or scale out at one time (must be >= 1
and <= 10)
default: *1*
* `REPLICA_CAPACITY_PROVIDER_MAX_STEP_SIZE` Optional. The maximum number of EC2
instances for replicas that will scale in or scale out at one time (must be >= 1
and <= 10)
default: *1*
### 2 - Deploy
* Create the cluster, services and DNS routing stacks:
```
make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] create-all
```
The optional `AWS_REGION` and `AWS_REFIX` allow you to define where it will be deployed and what it will be named.
It might take several minutes to build the stack.
You can monitor the creations of the stacks in [CloudFormation](https://console.aws.amazon.com/cloudformation/home)
* *NOTE*: the creation of the cluster needs an EC2 key pair are useful when you need to connect
to the EC2 instances for troubleshooting purposes. The key pair is automatically generated
and stored in a `pem` file on the current directory.
To use when ssh-ing into your instances as follow: `ssh -i cluster-keys.pem ec2-user@<ec2_instance_ip>`
### Cleaning up
```
make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all
```
The optional `AWS_REGION` and `AWS_REFIX` allow you to specify exactly which stack you target for deletion.
Note that this will *not* delete:
* Secrets stored in Secret Manager
* SSL certificates
* ECR repositories
* Replica EFS stack
* VPC and subnets (if created as part of this deployment, rather than externally
provided)
### Persistent stacks
Blue/green deployment of the primary-replica recipe requires that the blue and
the green stacks are deployed within the same VPC.
In order to preserve the VPC, the IGW and the subnet upon deletion of
the blue stack, the nested network cloudformation template needs to be
protected from deletion.
Note that you can completely delete the stack, including explicitly retained
resources such as the EFS Git filesystem, VPC and subnets, by issuing the more
aggressive command:
```
make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all-including-retained-stack
```
Note that this will execute a prompt to confirm your choice:
```
* * * * WARNING * * * * this is going to completely destroy the stack, including git data.
Are you sure you want to continue? [y/N]
```
If you want to automate this programmatically you can just pipe the `yes`
command to the make:
```
yes | make [AWS_REGION=a-valid-aws-region] [AWS_PREFIX=some-cluster-prefix] delete-all-including-retained-stack
```
### Access your Gerrit instances
Get the URL of your Gerrit primary instance this way:
```
aws cloudformation describe-stacks \
--stack-name <SERVICE_PRIMARY_STACK_NAME> \
| grep -A1 '"OutputKey": "CanonicalWebUrl"' \
| grep OutputValue \
| cut -d'"' -f 4
```
Similarly for the replica:
```
aws cloudformation describe-stacks \
--stack-name <SERVICE_REPLICA_STACK_NAME> \
| grep -A1 '"OutputKey": "CanonicalWebUrl"' \
| grep OutputValue \
| cut -d'"' -f 4
```
Gerrit primary instance ports:
* HTTP `8080`
* SSH `29418`
Gerrit replica instance ports:
* HTTP `9080`
* SSH `39418`
### Docker
Refer to the [Docker](../Docker.md) section for information on how to setup docker or how to publish images