| # Gerrit Multi-Site Plugin Design |
| |
| This document collects and organizes thoughts about |
| the design of the Gerrit multi-site plugin, supporting the definition of the |
| [implementation roadmap](#next-steps-in-the-road-map). |
| |
| It first presents background for the problems the plugin will address and |
| the tools currently available in the Gerrit ecosystem that support the |
| solution. It then lays out an overall roadmap for implementing support for Gerrit |
| multi-site, and a snapshot of the current status of the design including associated |
| limitations and constraints. |
| |
| ## Approaches to highly scalable and available Gerrit |
| |
| Companies that adopt Gerrit as the center of their development and review |
| pipeline often have the requirement to be available on a 24/7 basis. This requirement |
| may extend across large and geographically distributed teams in different continents. |
| |
| Because of constraints defined by the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem) |
| designing a performant and scalable solution is a real challenge. |
| |
| ### Vertical scaling and high availability |
| |
| Vertical scaling is one of the options to support a high load and a large number |
| of users. A powerful server with multiple cores and sufficient RAM to |
| potentially fit the most frequently used repositories simplifies the design and |
| implementation of the system. The relatively reasonable cost of hardware and availability |
| of multi-core CPUs make this solution highly attractive to some large |
| Gerrit setups. Further, the central Gerrit server can be duplicated with an |
| active/passive or active/active high availability configuration with the storage of the |
| Git repositories shared across nodes through dedicated fibre-channel lines or |
| SANs. |
| |
| This approach can be suitable for mid to large-sized Gerrit installations where |
| teams are co-located or connected via high-speed dedicated networks. However, |
| when teams are located on opposite sides of the planet, even at the speed of light |
| the highest theoretical fire-channel direct connection can be limiting. For example, |
| from San Francisco to Bangalore the theoretical absolute minimum latency is 50 |
| msec. In practice, however, it is often around 150/200 msec in the best case |
| scenarios. |
| |
| ### Horizontal scaling and multi-site |
| |
| In the alternate option, horizontal scaling, the workload is spread |
| across several nodes, which are distributed to different locations. |
| For our teams in San Francisco and Bangalore, each accesses a |
| set of Gerrit masters located closer to their geographical location, with higher |
| bandwidth and lower latency. (To control operational cost from the proliferation of |
| servers, the number of Gerrit masters can be scaled up and down on demand.) |
| |
| This solution offers a higher level of scalability and lower latency across locations, |
| but it requires a more complex design. |
| |
| ### Multi-master and multi-site, the best of both worlds |
| |
| The vertical and horizontal approaches can be combined to achieve |
| both high performance on the same location and low latency across |
| geographically distributed sites. |
| |
| Geographical locations with larger teams and projects can have a bigger |
| Gerrit server in a high availability configuration, while locations with |
| less critical service levels can use a lower-spec setup. |
| |
| ## Focus of the multi-site plugin |
| |
| The multi-site plugin enables the OpenSource version of Gerrit |
| Code Review to support horizontal scalability across sites. |
| |
| Gerrit has already been deployed in a multi-site configuration at |
| [Google](https://www.youtube.com/watch?v=wlHpqfnIBWE) and in a multi-master fashion |
| at [Qualcomm](https://www.youtube.com/watch?v=X_rmI8TbKmY). Both implementations |
| include fixes and extensions that are tailored to the specific |
| infrastructure requirements of each company's global networks. Those |
| solutions may or may not be shared with the rest of the OpenSource Community. |
| Specifically, Google's deployment is proprietary and not suitable for any |
| environment outside Google's data-centers. Further, in |
| Qualcomm's case, their version of Gerrit is a fork of v2.7. |
| |
| In contrast, the multi-site plugin is based on standard OpenSource components and |
| is deployed on a standard cloud environment. It is currently used in a multi- |
| master and multi-site deployment on GerritHub.io, serving two continents (Europe |
| and North America) in a high availability setup on each site. |
| |
| # The road to multi-site |
| |
| The development of the multi-site support for Gerrit is complex and thus has |
| been deliberately broken down into incremental steps. The starting point is a |
| single Gerrit master deployment, and the end goal is a fully distributed set of |
| cooperating Gerrit masters across the globe. |
| |
| 1. 1x master / single location. |
| 2. 2x masters (active/standby) / single location - shared disks |
| 3. 2x masters (active/passive) / single location - shared disks |
| 4. 2x masters (active RW/active RO) / single location - shared disks |
| 5. 2x masters (active RW/active RO) / single location - separate disks |
| 6. 2x masters (active RW/active RO) / active + disaster recovery location |
| 7. 2x masters (active RW/active RO) / two locations |
| 8. 2x masters (active RW/active RW) sharded / two locations |
| 9. 3x masters (active RW/active RW) sharded with auto-election / two locations |
| 10. Multiple masters (active RW/active RW) with quorum / multiple locations |
| |
| The transition between steps requires not only an evolution of the Gerrit |
| setup and the set of plugins but also the implementation of more mature methods to |
| provision, maintain and version servers across the network. Qualcomm has |
| pointed out that the evolution of the company culture and the ability to consistently |
| version and provision the different server environments are winning features of |
| their multi-master setup. |
| |
| Google is currently running at Stage #10. Qualcomm is at Stage #4 with the |
| difference that both masters are serving RW traffic, which is possible because the specifics |
| of their underlying storage, NFS and JGit implementation allows concurrent |
| locking at the filesystem level. |
| |
| ## TODO: Synchronous replication |
| Consider also synchronous replication for cases like 5, 6, 7... in which |
| cases a write operation is only accepted if it is synchronously replicated to the |
| other master node(s). This would provide 100% loss-less disaster recovery support. Without |
| synchronous replication, when the RW master crashes, losing data, there could |
| be no way to recover missed replications without soliciting users who pushed the commits |
| in the first place to push them again. Further, with synchronous replication |
| the RW site has to "degrade" to RO mode when the other node is not reachable and |
| synchronous replications are not possible. |
| |
| We must re-evaluate the useability of the replication plugin for supporting |
| synchronous replication. For example, the replicationDelay doesn't make much |
| sense in the synchronous case. Further, the rescheduling of a replication due |
| to an in-flight push to the same remote URI also doesn't make much sense as we |
| want the replication to happen immediately. Further, if the ref-update of the |
| incoming push request has to be blocked until the synchronous replication |
| finishes, the replication plugin cannot even start a replication as there is no |
| ref-updated event yet. We may consider implementing the synchronous |
| replication on a lower level. For example have an "pack-received" event and |
| then simply forward that pack file to the other site. Similarly for the |
| ref-updated events, instead of a real git push, we could just forward the |
| ref-updates to the other site. |
| |
| ## History and maturity level of the multi-site plugin |
| |
| This plugin expands upon the excellent work on the high-availability plugin, |
| introduced by Ericsson for implementing mutli-master at Stage #4. The git log history |
| of this projects still shows the 'branching point' where it started. |
| |
| The current version of the multi-site plugin is at Stage #7, which is a pretty |
| advanced stage in the Gerrit multi-master/multi-site configuration. |
| |
| Thanks to the multi-site plugin, it is now possible for Gerrit data to be |
| available in two separate geo-locations (e.g. San Francisco and Bangalore), |
| each serving local traffic through the local instances with minimum latency. |
| |
| ### Why another plugin from a high availability fork? |
| |
| You may be questioning the reasoning behind |
| creating yet another plugin for multi-master, instead of maintaining |
| a single code-base with the high-availability plugin. The choice stems from |
| the differing design considerations to address scalabiilty, as discussed above |
| for the vertical (single site) and horizonal (multi-site) approaches. |
| |
| In theory, one could keep a single code-base to manage both approaches, however the |
| result would be very complicated and difficult to configure and install. |
| Having two more focussed plugins, one for high availability and another for |
| multi-site, allows us to have a simpler, more usable experience, both for developers |
| of the plugin and for the Gerrit administrators using it. |
| |
| ### Benefits |
| |
| There are some advantages in implementing multi-site at Stage #7: |
| |
| - Optimal latency of the read-only operations on both sites, which constitutes around 90% |
| of the Gerrit traffic overall. |
| |
| - High SLA (99.99% or higher, source: GerritHub.io) can be achieved by |
| implementing both high availability inside each local site, and automatic |
| catastrophic failover between the two sites. |
| |
| - Access transparency through a single Gerrit URL entry-point. |
| |
| - Automatic failover, disaster recovery, and leader re-election. |
| |
| - The two sites have local consistency, with eventual consistency globally. |
| |
| ### Limitations |
| |
| The current limitations of Stage #7 are: |
| |
| - **Single RW site**: Only the RW site can accept modifications on the |
| Git repositories or the review data. |
| |
| - **Supports only two sites**: |
| One could, potentially, support more sites, but the configuration |
| and maintenance efforts are more than linear to the number of nodes. |
| |
| - **Single point of failure:** The switch between the RO to RW sites is managed by a unique decision point. |
| |
| - **Lack of transactionality**: |
| Data written to one site is acknowledged before its replication to the other location. |
| |
| - **Requires Gerrit v2.16 or later**: Data conisistency requires a server completely based on NoteDb. |
| If you are not familiar with NoteDb, please read the relevant |
| [section in the Gerrit documentation](https://gerrit-documentation.storage.googleapis.com/Documentation/2.16.5/note-db.html). |
| |
| ### Example of multi-site operations |
| |
| Let's suppose the RW site is San Francisco and the RO site Bangalore. The |
| modifications of data will always come to San Francisco and flow to Bangalore |
| with a latency that can be between seconds and minutes, depending on |
| the network infrastructure between the two sites. A developer located in |
| Bangalore will always see a "snapshot in the past" of the data, both from the |
| Gerrit UI and on the Git repository served locally. In contrast, a developer located in |
| San Francisco will always see the "latest and greatest" of everything. |
| |
| Should the central site in San Francisco become unavailable for a |
| significant period of time, the Bangalore site will take over as the RW Gerrit |
| site. The roles will then be inverted. |
| People in San Francisco will be served remotely by the |
| Bangalore server while the local system is down. When the San Francisco site |
| returns to service, and passes the "necessary checks", it will be re-elected as the |
| main RW site. |
| |
| # Plugin design |
| |
| This section goes into the high-level design of the current solution, lists |
| the components involved, and describes how the components interact with each other. |
| |
| ## What is replicated across Gerrit sites |
| |
| There are several distinct classes of information that have to be kept |
| consistent across different sites in order to guarantee seamless operation of the |
| distributed system. |
| |
| - **Git repositories**: They are stored on disk and are the most important |
| information to maintain. The repositories store the following data: |
| |
| * Git BLOBs, objects, refs and trees. |
| |
| * NoteDb, including Groups, Accounts and review data |
| |
| * Project configurations and ACLs |
| |
| * Project submit rules |
| |
| - **Indexes**: A series of secondary indexes to allow search and quick access |
| to the Git repository data. Indexes are persistent across restarts. |
| |
| - **Caches**: A set of in-memory and persisted data designed to reduce CPU and disk |
| utilization and to improve performance. |
| |
| - **Web Sessions**: Define an active user session with Gerrit, used to reduce |
| load to the underlying authentication system. |
| Sessions are stored by default on the local filesystem in an H2 table but can |
| be externalized via plugins, like the WebSession Flatfile. |
| |
| To achieve a Stage #7 multi-site configuration, all the above information must |
| be replicated transparently across sites. |
| |
| ## High-level architecture |
| |
| The multi-site solution described here depends upon the combined use of different |
| components: |
| |
| - **multi-site libModule**: exports interfaces as DynamicItems to plug in specific |
| implementation of `Brokers` and `Global Ref-DB` plugins. |
| |
| - **broker plugin**: an implementation of the broker interface, which enables the |
| replication of Gerrit _indexes_, _caches_, and _stream events_ across sites. |
| When no specific implementation is provided, then the [Broker Noop implementation](#broker-noop-implementation) |
| then libModule interfaces are mapped to internal no-ops implementations. |
| |
| - **Global Ref-DB plugin**: an implementation of the Global Ref-DB interface, |
| which enables the detection of out-of-sync refs across gerrit sites. |
| When no specific implementation is provided, then the [Global Ref-DB Noop implementation](#global-ref-db-noop-implementation) |
| then libModule interfaces are mapped to internal no-ops implementations. |
| |
| - **replication plugin**: enables the replication of the _Git repositories_ across |
| sites. |
| |
| - **web-session flat file plugin**: supports the storage of _active sessions_ |
| to an external file that can be shared and synchronized across sites. |
| |
| - **health check plugin**: supports the automatic election of the RW site based |
| on a number of underlying conditions of the data and the systems. |
| |
| - **HA Proxy**: provides the single entry-point to all Gerrit functionality across sites. |
| |
| The interactions between these components are illustrated in the following diagram: |
| |
| ![Initial Multi-Site Plugin Architecture](images/architecture-first-iteration.png) |
| |
| ## Implementation Details |
| |
| ### Multi-site libModule |
| As mentioned earlier there are different components behind the overarching architecture |
| of this solution of a distributed multi-site gerrit installation, each one fulfilling |
| a specific goal. However, whilst the goal of each component is well-defined, the |
| mechanics on how each single component achieves that goal is not: the choice of which |
| specific message broker or which Ref-DB to use can depend on different factors, |
| such as scalability, maintainability, business standards and costs, to name a few. |
| |
| For this reason the multi-site component is designed to be explicitly agnostic to |
| specific choices of brokers and Global Ref-DB implementations, and it does |
| not care how they, specifically, fulfill their task. |
| |
| Instead, this component takes on only two responsibilities: |
| |
| * Wrapping the GitRepositoryManager so that every interaction with git can be |
| verified by the Global Ref-DB plugin. |
| |
| * Exposing DynamicItem bindings onto which concrete _Broker_ and a _Global Ref-DB_ |
| plugins can register their specific implementations. |
| When no such plugins are installed, then the initial binding points to no-ops. |
| |
| * Detect out-of-sync refs across multiple gerrit sites: |
| Each change attempting to mutate a ref will be checked against the Ref-DB to |
| guarantee that each node has an up-to-date view of the repository state. |
| |
| ### Message brokers plugin |
| Each gerrit node in the cluster needs to be informed and inform all other nodes |
| about fundamental events, such as indexing of new changes, cache evictions and |
| stream events. This component will provide a specific pub/sub broker implementation |
| that is able to do so. |
| |
| When provided, the message broker plugin will override the dynamicItem binding exposed |
| by the multi-site module with a specific implementation, such as Kafka, RabbitMQ, NATS, etc. |
| |
| #### Broker Noop implementation |
| The default `Noop` implementation provided by the `Multi-site` libModule does nothing |
| upon publishing and producing events. This is useful for setting up a test environment |
| and allows multi-site library to be installed independently from any additional |
| plugins or the existence of a specific broker installation. |
| The Noop implementation can also be useful when there is no need for coordination |
| with remote nodes, since it avoids maintaining an external broker altogether: |
| for example, using the multi-site plugin purely for the purpose of replicating the Git |
| repository to a disaster-recovery site and nothing else. |
| |
| ### Global Ref-DB plugin |
| Whilst the replication plugin allows the propagation of the Git repositories across |
| sites and the broker plugin provides a mechanism to propagate events, the Global |
| Ref-DB ensures correct alignment of refs of the multi-site nodes. |
| |
| It is the responsibility of this plugin to store atomically key/pairs of refs in |
| order to allow the libModule to detect out-of-sync refs across multi sites. |
| (aka split brain). This is achieved by storing the most recent `sha` for each |
| specific mutable `refs`, by the usage of some sort of atomic _Compare and Set_ operation. |
| |
| We mentioned earlier the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem), |
| which in a nutshell states that a distributed system can only provide two of these |
| three properties: _Consistency_, _Availability_ and _Partition tolerance_: the Global |
| Ref-DB helps achieving _Consistency_ and _Partition tolerance_ (thus sacrificing |
| Availability). |
| |
| See [Prevent split brain thanks to Global Ref-DB](#prevent-split-brain-thanks-to-global-ref-db) |
| For a thorough example on this. |
| |
| When provided, the Global Ref-DB plugin will override the dynamicItem binding |
| exposed by the multi-site module with a specific implementation, such as Zoekeeper, |
| etcd, MySQL, Mongo, etc. |
| |
| #### Global Ref-DB Noop implementation |
| The default `Noop` implementation provided by the `Multi-site` libModule accepts |
| any refs without checking for consistency. This is useful for setting up a test environment |
| and allows multi-site library to be installed independently from any additional |
| plugins or the existence of a specific Ref-DB installation. |
| |
| ### Eventual consistency on Git, indexes, caches, and stream events |
| |
| The replication of the Git repositories, indexes, cache and stream events happen |
| on different channels and at different speeds. Git data is typically larger than |
| meta-data and has higher latency than reindexing, cache evictions or stream |
| events. This means that when someone pushes a new change to Gerrit on one site, |
| the Git data (commits, BLOBs, trees, and refs) may arrive later than the |
| associated reindexing or cache eviction events. |
| |
| It is, therefore, necessary to handle the lack of synchronization of those |
| channels in the multi-site plugin and reconcile the events at the destinations. |
| |
| The solution adopted by the multi-site plugin supports eventual consistency at |
| rest at the data level, thanks to the following two components which: |
| |
| * **Identify not-yet-processable events**: |
| A mechanism to recognize _not-yet-processable events_ related to data not yet |
| available (based on the timestamp information available on both the metadata |
| update and the data event) |
| |
| * **Queue not-yet-processable events**: |
| A queue of *not-yet-processable events* and an *asynchronous processor* |
| to check if they became processable. The system also is configured to discard |
| events that have been in the queue for too long. |
| |
| ### Avoid event replication loops |
| |
| Stream events are wrapped into an event header containing a source identifier. |
| Events originated by the same node in the broker-based channel are silently |
| dropped so that they do not replicate multiple times. |
| |
| Gerrit has the concept of **server-id** which, unfortunately, does not help |
| solve this problem because all the nodes in a Gerrit cluster must have the same |
| server-id to allow interoperability of the data stored in NoteDb. |
| |
| The multi-site plugin introduces a new concept of **instance-id**, which is a UUID |
| generated during startup and saved into the data folder of the Gerrit site. If |
| the Gerrit site is cleared or removed, a new id is generated and the multi-site |
| plugin will start consuming all events that have been previously produced. |
| |
| The concept of the instance-id is very useful. Since other plugins could benefit |
| from it, it will be the first candidate to move into the Gerrit core, |
| generated and maintained with the rest of the configuration. Then it can be |
| included in **all** stream events, at which time the multi-site plugin's |
| "enveloping of events" will become redundant. |
| |
| ### Manage failures |
| |
| The broker based solutions improve the resilience and scalability of the system. |
| But there is still a point of failure: the availability of the broker itself. However, |
| using the broker does allow having a high-level of redundancy and a multi-master |
| / multi-site configuration at the transport and storage level. |
| |
| At the moment, the acknowledge level for publication can be controlled via |
| configuration and allows to tune the QoS of the publication process. Failures |
| are explicitly not handled at the moment; they are just logged as errors. |
| There is no retry mechanism to handle temporary failures. |
| |
| ### Avoid Split Brain |
| |
| The current solution of multi-site at Stage #7 with asynchronous replication |
| risks that the system will reach a Split Brain situation (see |
| [issue #10554](https://bugs.chromium.org/p/gerrit/issues/detail?id=10554)). |
| |
| #### The diagram below illustrates the happy path with crash recovery returning the system to a healthy state. |
| |
| ![Healthy Use Case](images/git-replication-healthy.png) |
| |
| In this case we are considering two different clients each doing a `push` on top of |
| the same reference. This could be a new commit in a branch or the change of an existing commit. |
| |
| At `t0`: both clients see the status of `HEAD` being `W0`. `Instance1` is the |
| RW node and will receive any `push` request. `Instance1` and `Instance2` are in sync |
| at `W0`. |
| |
| At `t1`: `Client1` pushes `W1`. The request is served by `Instance1` which acknowledges it |
| and starts the replication process (with some delay). |
| |
| At `t2`: The replication operation is completed. Both instances are in a consistent state |
| `W0 -> W1`. `Client1` shares that state but `Client2` is still behind. |
| |
| At `t3`: `Instance1` crashes. |
| |
| At `t4`: `Client2` pushes `W2` which is still based on `W0` (`W0 -> W2`). |
| The request is served by `Instance2` which detects that the client push operation was based |
| on an out-of-date starting state for the ref. The operation is refused. `Client2` |
| synchronises its local state (e.g. rebases its commit) and pushes `W0 -> W1 -> W2`. |
| That operation is now considered valid, acknowledged and put in the replication queue until |
| `Instance1` becomes available. |
| |
| At `t5`: `Instance1` restarts and is replicated at `W0 -> W1 -> W2` |
| |
| #### The Split Brain situation is illustrated in the following diagram. |
| |
| ![Split Brain Use Case](images/git-replication-split-brain.png) |
| |
| In this case the steps are very similar except that `Instance1` fails after acknowledging the |
| push of `W0 -> W1` but before having replicated the status to `Instance2`. |
| |
| When in `t4` `Client2` pushes `W0 -> W2` to `Instance2`, this is considered a valid operation. |
| It gets acknowledged and inserted in the replication queue. |
| |
| At `t5` `Instance1` restarts. At this point both instances have pending replication |
| operations. They are executed in parallel and they bring the system to divergence. |
| |
| Root causes of the Split Brain problem: |
| - The RW node acknowledges a `push` operation before __all__ replicas are fully in sync. |
| - The other instances are not aware that they are out of sync. |
| |
| Two possible approaches to solve the Split Brain problem: |
| |
| - **Synchronous replication**: In this case the system would behave essentially as the |
| _happy path_ diagram shown above and would solve the problem by operating on the first of the causes, |
| at the expense of performance, availability and scalability. It is a viable and simple solution |
| for two nodes set up with an infrastructure allowing fast replication. |
| |
| - **Centralise the information about the latest status of mutable refs**: This would operate |
| on the second cause. That is, it would allow instances to realise that _they are |
| not in sync on a particular ref_ and refuse any write operation on that ref. |
| The system could operate normally on any other ref and also would impose no |
| limitation in other functions such as, Serving the GUI, supporting reads, accepting new |
| changes or patch-sets on existing changes. This option is discussed in further |
| detail below. |
| |
| **NOTE**: The two options are not exclusive. |
| |
| #### Prevent split brain thanks to Global Ref-DB |
| |
| The above scenario can be prevented by using an implementation of the Global Ref-DB |
| interface, which will operate as follows: |
| |
| ![Split Brain Prevented](images/git-replication-split-brain-detected.png) |
| |
| The difference, in respect to the split brain use case, is that now, whenever a change of a |
| _mutable ref_ is requested, the Gerrit server verifies with the central RefDB that its |
| status __for this ref__ is consistent with the latest cluster status. If that is true |
| the operation succeeds. The ref status is atomically compared and set to the new status |
| to prevent race conditions. |
| |
| In this case `Instance2` enters a Read Only mode for the specific branch |
| until the replication from `Instance1` is completed successfully. At this point write |
| operations on the reference can be recovered. |
| If `Client2` can perform the `push` again vs `Instance2`, the server recognises that |
| the client status needs an update, the client will `rebase` and `push` the correct status. |
| |
| __NOTE__: |
| This implementation will prevent the cluster to enter split brain but might result in a |
| set of refs in Read Only state across all the cluster if the RW node is failing after having |
| sent the request to the Ref-DB but before persisting this request into its `git` layer. |
| |
| #### Geo located Gerrit master election |
| |
| Once you go multi-site multi-master you can improve the latency of your calls by |
| serving traffic from the closest server to your user. |
| |
| Whether you are running your infrastructure in the cloud or on premise you have different solutions you can look at. |
| |
| ##### AWS |
| |
| Route53 AWS DNS service offers the opportunity of doing [Geo Proximity](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-geoproximity) |
| routing using [Traffic Flow](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/traffic-flow.html). |
| |
| Traffic flow is a tool which allows the definition of traffic policies and rules via a UI. Traffic rules are of different types, among which *Geoproximity* rules. |
| |
| When creating geoproximity rules for your resources you can specify one of the following values for each rule: |
| |
| * If you're using AWS resources, the AWS Region that you created the resource in |
| * If you're using non-AWS resources, the latitude and longitude of the resource. |
| |
| This allows you to have an hybrid cloud-on premise infrastructure. |
| |
| You can define quite complex failover rules to ensure high availability of your system ([here](https://pasteboard.co/ILFSd5Y.png) an example). |
| |
| Overall the service provided is pretty much a smart reverse-proxy, if you want more |
| complex routing strategies you will still need a proper Load Balancer. |
| |
| ##### GCE |
| |
| GCE [doesn't offer](https://cloud.google.com/docs/compare/aws/networking#dns) a Geographical based routing, but it implicitly has geo-located DNS entries |
| when distributing your application among different zones. |
| |
| The Load Balancer will balance the traffic to the [nearest available instance](https://cloud.google.com/load-balancing/docs/backend-service#backend_services_and_regions) |
| , but this is not configurable and the app server has to be in GC. |
| |
| Hybrid architectures are supported but would make things more complicated, |
| hence this solution is probably worthy only when the Gerrit instances are running in GC. |
| |
| ##### On premise |
| |
| If you are going for an on premise solution and using HAProxy as Load Balancer, |
| it is easy to define static ACL based on IP ranges and use them to route your traffic. |
| |
| This [blogpost](https://icicimov.github.io/blog/devops/Haproxy-GeoIP/) explains how to achieve it. |
| |
| On top of that, you want to define a DNS entry per zone and use the ACLs you just defined to |
| issue redirection of the calls to most appropiate zone. |
| |
| You will have to add to your frontend definition your redirection strategy, i.e.: |
| |
| ``` |
| http-request redirect code 307 prefix https://review-eu.gerrithub.io if acl_EU |
| http-request redirect code 307 prefix https://review-am.gerrithub.io if acl_NA |
| ``` |
| |
| # Next steps in the roadmap |
| |
| ## Step-1: Fill the gaps in multi-site Stage #7 implementation: |
| |
| - **Detection of a stale site**: The health check plugin has no awareness that one |
| site that can be "too outdated" because it is still technically "healthy." A |
| stale site needs to be put outside the balancing and all traffic needs to go |
| to the more up-to-date site. |
| |
| - **Web session replication**: This currently must be implemented at the filesystem level |
| using rsync across sites. This is problematic because of the delay it |
| introduces. Should a site fail, some of the users may lose their sessions |
| because the rsync was not executed yet. |
| |
| - **Index rebuild in case of broker failure**: In the case of a catastrophic |
| failure at the broker level, the indexes of the two sites will be out of |
| sync. A mechanism is needed to recover the situation |
| without requiring the reindex of both sites offline, since that could take |
| as much as days for huge installations. |
| |
| - **Git/SSH redirection**: Local users who rely on Git/SSH protocol are not able |
| to use the local site for serving their requests, because HAProxy is not |
| able to differentiate the type of traffic and, thus, is forced always to use the |
| RW site, even though the operation is RO. |
| |
| ## Step-2: Move to multi-site Stage #8. |
| |
| - Auto-reconfigure HAProxy rules based on the projects sharding policy |
| |
| - Serve RW/RW traffic based on the project name/ref-name. |
| |
| - Balance traffic with "locally-aware" policies based on historical data |