Merge branch 'stable-3.1' into stable-3.2 * stable-3.1: Update multi-site DESIGN.md Change-Id: Ia0caf7c0a528ca85c26c6cc8e97add9467854fef

commit: 1d4772d058a46e4a3e6d9f9f583692bf28960db3 [log] [tgz]
author: Luca Milanesio <luca.milanesio@gmail.com> Tue Sep 29 11:07:26 2020 +0100
committer: Luca Milanesio <luca.milanesio@gmail.com> Tue Sep 29 11:07:26 2020 +0100
tree: 7cdfead617fab876dd91bbf899c632873d918ca8
parent: 7f0b6864384aa65e749a22b0dd72a59c7725ffe6 [diff]
parent: 63a11fd075761b5aab3529235c89d99c7875d81b [diff]
diff --git a/DESIGN.md b/DESIGN.md
index 270b7af..1bbe78f 100644
--- a/DESIGN.md
+++ b/DESIGN.md

@@ -95,8 +95,8 @@
 5. 2x masters (active RW/active RO) / single location - separate disks
 6. 2x masters (active RW/active RO) / active + disaster recovery location
 7. 2x masters (active RW/active RO) / two locations
-8. 2x masters (active RW/active RW) sharded / two locations
-9. 3x masters (active RW/active RW) sharded with auto-election / two locations
+8. 2x masters (active RW/active RW) / two locations
+9. 2 or more masters (active RW/active RW) sharded across 2 or more locations
 10. Multiple masters (active RW/active RW) with quorum / multiple locations
 
 The transition between steps requires not only an evolution of the Gerrit
@@ -109,43 +109,104 @@
 Google is currently running at Stage #10.  Qualcomm is at Stage #4 with the
 difference that both masters are serving RW traffic, which is possible because the specifics
 of their underlying storage, NFS and JGit implementation allows concurrent
-locking at the filesystem level.
+locking at the filesystem level. GerritHub is running at Stage #9, with 3 locations.
 
-## TODO: Synchronous replication
-Consider also synchronous replication for cases like 5, 6, 7... in which
-cases a write operation is only accepted if it is synchronously replicated to the
-other master node(s). This would provide 100% loss-less disaster recovery support. Without
-synchronous replication, when the RW master crashes, losing data, there could
-be no way to recover missed replications without soliciting users who pushed the commits
-in the first place to push them again. Further, with synchronous replication
-the RW site has to "degrade" to RO mode when the other node is not reachable and
-synchronous replications are not possible.
+## Projects sharding
 
-We must re-evaluate the useability of the replication plugin for supporting
-synchronous replication. For example, the replicationDelay doesn't make much
-sense in the synchronous case. Further, the rescheduling of a replication due
-to an in-flight push to the same remote URI also doesn't make much sense as we
-want the replication to happen immediately. Further, if the ref-update of the
-incoming push request has to be blocked until the synchronous replication
-finishes, the replication plugin cannot even start a replication as there is no
-ref-updated event yet. We may consider implementing the synchronous
-replication on a lower level. For example have an "pack-received" event and
-then simply forward that pack file to the other site. Similarly for the
-ref-updated events, instead of a real git push, we could just forward the
-ref-updates to the other site.
+Having all the repositories replicated to all sites could be, in some cases, not
+a great idea. The rationale can be explained with a simple example.
+
+### The Tango secret project
+
+Company FooCompany is developing a new huge and secret project code-named Tango
+with a software engineering team all geo-located in India.
+The Git repository is huge and contains millions of refs and packfiles for tens
+of GBytes. Project Tango requires also to have some medium-sizes binaries in the
+Git repository.
+FooCompany has a multi-site deployment across the globe, covering Europe, USA,
+Australia and China, other than India, where the new project is developed.
+
+The teams in Europe and USA are involved in the project, from a code-review perspective.
+Their engineers are typically using the Gerrit UI for reviews and fetch individual
+patch-sets for local verification.
+
+### Tango secret project, without sharding
+
+All projects are replicated everywhere, including the Tango project.
+The replication creates a huge network overload across the globe.
+
+When an engineer is pushing a packfile in India, it gets replicated to all sites,
+causing congestion on the replication channel.
+When a software engineer in Europe reviews the changes of the Tango project, it
+creates modifications to the NoteDb meta ref that would be then replicated back
+to India with a non-neglibigle latency, due to the size of the repository and
+the huge refs advertisement phase implied in the replication.
+
+Software engineers around the globe do not need to see the Tango project, with
+the exception of the reviewers in Europe and USA. However, everyone is impacted
+and the servers and replication channels are overloaded.
+
+### Tango secret local project, with sharding
+
+The multi-site setup is using a sharding logic, projects are replicated
+or not depending on how they are classified:
+
+1. Global projects: category of projects that need to be always replicated to
+   all sites. (Example: All-Projects and All-Users)
+2. Local projects: category of projects that may not be replicated to
+   all sites. (Example: the Tango project mentioned above)
+
+The Tango project is a _local project_ because it is mainly developed in one
+site: India.
+
+When an engineer is pushing a packfile in India, it does not get replicated to
+all sites, saving bandwidth for the global projects replication.
+When a software engineer in Europe opens a change associated with the Tango project,
+he gets silently redirected to the site in India where the project is located.
+
+All sessions are broadcasted across the sites, so he does not realise that he is
+in a different site. Gerrit assets are the same, CSS, JavaScript, Web components,
+across all sites: the only thing that he may notice is a slight delay in the underlying
+REST requests made by his browser.
+
+Reviewers commenting on changes of the Tango project, create modifications to the NoteDb
+in India, which are immediately visible to the local software engineers, without
+a long replication lag.
+
+Software engineers around the globe do not need to see the Tango project, with
+the exception of the reviewers in Europe and USA. The Tango project is not visible
+and not replicated to the other sites and, the people not involved in the project,
+are not impacted at all.
+
+## Pull replication, synchronous or asynchronous
+
+Consider also pull replication for cases like 5, 6, 7... which could be done
+also synchronously to the incoming write operation.
+In case a write operation fails to be replicated by the master node(s), it could be
+automatically rolled back and reported to the client for retry.
+This would provide 100% loss-less disaster recovery support.
+
+When running pull replication asynchronously, similarly to the replication plugin,
+an unrecoverable crash of the replication source would result in unnoticed data loss.
+The only way to recover the data would be telling the users who pushed the commits
+to push them again. However, someone needs to manually detect the issue in the
+replication log and get in touch with the user.
+
+The [pull-replication plugin](https://gerrit.googlesource.com/plugins/pull-replication)
+supports synchronous replication and has the structure to perform also the
+asynchronous variant in the future.
 
 ## History and maturity level of the multi-site plugin
 
 This plugin expands upon the excellent work on the high-availability plugin,
 introduced by Ericsson for implementing mutli-master at Stage #4. The git log history
 of this projects still shows the 'branching point' where it started.
+The v2.16.x (with NoteDb) of the multi-site plugin was at Stage #7.
 
-The current version of the multi-site plugin is at Stage #7, which is a pretty
-advanced stage in the Gerrit multi-master/multi-site configuration.
-
-Thanks to the multi-site plugin, it is now possible for Gerrit data to be
-available in two separate geo-locations (e.g. San Francisco and Bangalore),
-each serving local traffic through the local instances with minimum latency.
+The current version of the multi-site plugin is at Stage #9, it is now possible for
+Gerrit data to be available in two or more separate geo-locations
+(e.g. San Francisco, Frankfurt and Bangalore), each serving local traffic through
+the local instances with minimum latency.
 
 ### Why another plugin from a high availability fork?
 
@@ -161,60 +222,73 @@
 multi-site, allows us to have a simpler, more usable experience, both for developers
 of the plugin and for the Gerrit administrators using it.
 
+The high-availability and multi-site plugins are solutions to different problems.
+Two or more nodes on the same site are typically deployed to increase
+the reliability and scalability of a Gerrit setup, however, doesn't provide any
+benefit in terms of data access across locations. Replicating the repositories
+to remote locations does not help the scalability of a Gerrit setup but is more
+focused on reducing the data transfer time between the client and the server, thanks
+to the higher bandwidth available in the local regions.
+
 ### Benefits
 
-There are some advantages in implementing multi-site at Stage #7:
+There are some advantages in implementing multi-site at Stage #9:
 
-- Optimal latency of the read-only operations on both sites, which constitutes around 90%
-  of the Gerrit traffic overall.
+- Optimal latency of the Git read/write operations on all sites, and signficant
+  improvement of the Gerrit UI responsiveness, thanks fo the reduction of the
+  network latency.
 
 - High SLA (99.99% or higher, source: GerritHub.io) can be achieved by
-  implementing both high availability inside each local site, and automatic
-  catastrophic failover between the two sites.
+  implementing network distribution across sites.
 
-- Access transparency through a single Gerrit URL entry-point.
+- Access transparency through a single Gerrit URL, thanks to a geo-location DNS
+  routing.
 
-- Automatic failover, disaster recovery, and leader re-election.
+- Automatic failover, disaster recovery, and failover to remote sites.
 
-- The two sites have local consistency, with eventual consistency globally.
+- All sites have local consistency, with the assurance of global eventual
+  consistency.
 
 ### Limitations
 
-The current limitations of Stage #7 are:
+The current limitations of Stage #9 are:
 
-- **Single RW site**: Only the RW site can accept modifications on the
-  Git repositories or the review data.
+- **Limited supports for many sites**:
+  One could, potentially, support a very high number of sites, but the pull-replication
+  logic to all sites could have a serious consequence in the overall perceived latency.
+  Having to deal with a very high number of site requires the implementation of a quorum on
+  all the nodes available for replication.
 
-- **Supports only two sites**:
-  One could, potentially, support more sites, but the configuration
-  and maintenance efforts are more than linear to the number of nodes.
-
-- **Single point of failure:** The switch between the RO to RW sites is managed by a unique decision point.
-
-- **Lack of transactionality**:
-  Data written to one site is acknowledged before its replication to the other location.
-
-- **Requires Gerrit v2.16 or later**: Data conisistency requires a server completely based on NoteDb.
+- **Requires Gerrit v3.0 or later**: Data conisistency requires a server completely
+  based on NoteDb.
   If you are not familiar with NoteDb, please read the relevant
-  [section in the Gerrit documentation](https://gerrit-documentation.storage.googleapis.com/Documentation/2.16.5/note-db.html).
+  [section in the Gerrit documentation](https://gerrit-documentation.storage.googleapis.com/Documentation/3.0.12/note-db.html).
 
 ### Example of multi-site operations
 
-Let's suppose the RW site is San Francisco and the RO site Bangalore. The
-modifications of data will always come to San Francisco and flow to Bangalore
-with a latency that can be between seconds and minutes, depending on
-the network infrastructure between the two sites. A developer located in
-Bangalore will always see a "snapshot in the past" of the data, both from the
-Gerrit UI and on the Git repository served locally.  In contrast, a developer located in
-San Francisco will always see the "latest and greatest" of everything.
+Let's suppose you have two sites, in San Francisco and Bangalore. The
+modifications of data will flow from San Francisco to Bangalore and the other way round.
+
+Depending on the network infrastructure between the two sites latency can range
+between seconds and minutes. The available bandwith is low, so the Gerrit admin
+decides to use a traditional push replication (asynchronous) between the two sites.
+
+When a developer located in Bangalore accesses a repository for which most pushes
+originate from San Francisco, he may see a "snapshot in the past" of the data,
+both from the Gerrit UI and on the Git repository served locally.
+In contrast, a developer located in San Francisco will always see on his repository
+the "latest and greatest" of everything.
+Things are exactly in the other way around for a repository that is mainly
+receiving pushes from developers in Bangalore.
 
 Should the central site in San Francisco become unavailable for a
-significant period of time, the Bangalore site will take over as the RW Gerrit
-site. The roles will then be inverted.
-People in San Francisco will be served remotely by the 
+significant period of time, the Bangalore site will still be able to serve all
+Gerrit repositories, including those where most pushes come from San Francisco.
+People in San Francisco can't access their local site anymore, because it is
+unavailable. All the Git and Gerrit UI requests will be served remotely by the
 Bangalore server while the local system is down. When the San Francisco site
-returns to service, and passes the "necessary checks", it will be re-elected as the
-main RW site.
+comes up again, and passes the "necessary checks", it
+will become the main site again for the users in the same geo location..
 
 # Plugin design
 
@@ -249,7 +323,7 @@
   Sessions are stored by default on the local filesystem in an H2 table but can
   be externalized via plugins, like the WebSession Flatfile.
 
-To achieve a Stage #7 multi-site configuration, all the above information must
+To achieve a Stage #9 multi-site configuration, all the above information must
 be replicated transparently across sites.
 
 ## High-level architecture
@@ -270,11 +344,14 @@
 When no specific implementation is provided, then the [Global Ref-DB Noop implementation](#global-ref-db-noop-implementation)
 then libModule interfaces are mapped to internal no-ops implementations.
 
-- **replication plugin**: enables the replication of the _Git repositories_ across
-  sites.
+- **replication plugin**: enables asynchronous push replication of the _Git repositories_
+  across sites.
 
-- **web-session flat file plugin**: supports the storage of _active sessions_
-  to an external file that can be shared and synchronized across sites.
+- **pull replication plugin**: enables the synchronous replication of the _Git repositories_
+  across sites.
+
+- **web-session broker plugin**: supports the storage of _active sessions_
+  to a message broker topic, which is then broadcasted across sites.
 
 - **health check plugin**: supports the automatic election of the RW site based
   on a number of underlying conditions of the data and the systems.
@@ -288,6 +365,7 @@
 ## Implementation Details
 
 ### Multi-site libModule
+
 As mentioned earlier there are different components behind the overarching architecture
 of this solution of a distributed multi-site gerrit installation, each one fulfilling
 a specific goal. However, whilst the goal of each component is well-defined, the
@@ -355,6 +433,7 @@
 etcd, MySQL, Mongo, etc.
 
 #### Global Ref-DB Noop implementation
+
 The default `Noop` implementation provided by the `Multi-site` libModule accepts
 any refs without checking for consistency. This is useful for setting up a test environment
 and allows multi-site library to be installed independently from any additional
@@ -566,33 +645,18 @@
 
 # Next steps in the roadmap
 
-## Step-1: Fill the gaps in multi-site Stage #7 implementation:
-
-- **Detection of a stale site**: The health check plugin has no awareness that one
-  site that can be "too outdated" because it is still technically "healthy." A
-  stale site needs to be put outside the balancing and all traffic needs to go
-  to the more up-to-date site.
-
-- **Web session replication**: This currently must be implemented at the filesystem level
-  using rsync across sites.  This is problematic because of the delay it
-  introduces. Should a site fail, some of the users may lose their sessions
-  because the rsync was not executed yet.
-
-- **Index rebuild in case of broker failure**: In the case of a catastrophic
-  failure at the broker level, the indexes of the two sites will be out of
-  sync. A mechanism is needed to recover the situation
-  without requiring the reindex of both sites offline, since that could take
-  as much as days for huge installations.
-
-- **Git/SSH redirection**: Local users who rely on Git/SSH protocol are not able
-  to use the local site for serving their requests, because HAProxy is not
-  able to differentiate the type of traffic and, thus, is forced always to use the
-  RW site, even though the operation is RO.
-
-## Step-2: Move to multi-site Stage #8.
+## Move to multi-site Stage #10.
 
 - Auto-reconfigure HAProxy rules based on the projects sharding policy
 
-- Serve RW/RW traffic based on the project name/ref-name.
+- Implement more global-refdb storage layers (e.g. TiKV) and more cloud-native
+  message brokers (e.g. NATS)
 
-- Balance traffic with "locally-aware" policies based on historical data
+- Implement a quorum-based policy for accepting or rejecting changes in the pull-replication
+  plugin
+
+- Allow asynchronous pull-replication across sites, based on asynchronous events through
+  the message broker
+
+- Implement a "fast replication path" for NoteDb-only changes, instead of relying on the
+  Git protocol
\ No newline at end of file
commit	1d4772d058a46e4a3e6d9f9f583692bf28960db3	[log] [tgz]
author	Luca Milanesio <luca.milanesio@gmail.com>	Tue Sep 29 11:07:26 2020 +0100
committer	Luca Milanesio <luca.milanesio@gmail.com>	Tue Sep 29 11:07:26 2020 +0100
tree	7cdfead617fab876dd91bbf899c632873d918ca8
parent	7f0b6864384aa65e749a22b0dd72a59c7725ffe6 [diff]
parent	63a11fd075761b5aab3529235c89d99c7875d81b [diff]