| = Gerrit Code Review - NoteDb Backend |
| |
| NoteDb is the next generation of Gerrit storage backend, which replaces the |
| traditional SQL backend for change and account metadata with storing data in the |
| same repository as code changes. |
| |
| .Advantages |
| - *Simplicity*: All data is stored in one location in the site directory, rather |
| than being split between the site directory and a possibly external database |
| server. |
| - *Consistency*: Replication and backups can use a snapshot of the Git |
| repository refs, which will include both the branch and patch set refs, and |
| the change metadata that points to them. |
| - *Auditability*: Rather than storing mutable rows in a database, modifications |
| to changes are stored as a sequence of Git commits, automatically preserving |
| history of the metadata. + |
| There are no strict guarantees, and meta refs may be rewritten, but the |
| default assumption is that all operations are logged. |
| - *Extensibility*: Plugin developers can add new fields to metadata without the |
| core database schema having to know about them. |
| - *New features*: Enables simple federation between Gerrit servers, as well as |
| offline code review and interoperation with other tools. |
| |
| == Current Status |
| |
| - Storing change metadata is fully implemented in master, and is live on the |
| servers behind `googlesource.com`. In other words, if you use |
| link:https://gerrit-review.googlesource.com/[gerrit-review], you're already |
| using NoteDb. + |
| - Storing some account data, e.g. user preferences, is implemented in releases |
| back to 2.13. |
| - Storing the rest of account data is a work in progress. |
| - Storing group data is a work in progress. |
| |
| To match the current configuration of `googlesource.com`, paste the following |
| config snippet in your `gerrit.config`: |
| |
| ---- |
| [noteDb "changes"] |
| write = true |
| read = true |
| primaryStorage = NOTE_DB |
| disableReviewDb = true |
| ---- |
| |
| |
| For an example NoteDb change, poke around at this one: |
| ---- |
| git fetch https://gerrit.googlesource.com/gerrit refs/changes/70/98070/meta \ |
| && git log -p FETCH_HEAD |
| ---- |
| |
| == Configuration |
| |
| Account and group data is migrated to NoteDb automatically using the normal |
| schema upgrade process during updates. The remainder of this section details the |
| configuration options that control migration of the change data, which is mostly |
| but not fully implemented. |
| |
| Change migration state is configured in `gerrit.config` with options like |
| `noteDb.changes.*`. These options are undocumented outside of this file, and the |
| general approach has been to add one new option for each phase of the migration. |
| Assume that each config option in the following list requires all of the |
| previous options, unless otherwise noted. |
| |
| - `noteDb.changes.write=true`: During a ReviewDb write, the state of the change |
| in NoteDb is written to the `note_db_state` field in the `Change` entity. |
| After the ReviewDb write, this state is written into NoteDb, resulting in |
| effectively double the time for write operations. NoteDb write errors are |
| dropped on the floor, and no attempt is made to read from ReviewDb or correct |
| errors (without additional configuration, below). + |
| This state allows for a rolling update in a multi-master setting, where some |
| servers can start reading from NoteDb, but older servers are still reading |
| only from ReviewDb. |
| - `noteDb.changes.read=true`: Change data is written |
| to and read from NoteDb, but ReviewDb is still the source of truth. During |
| reads, first read the change from ReviewDb, and compare its `note_db_state` |
| with what is in NoteDb. If it doesn't match, immediately "auto-rebuild" the |
| change, copying data from ReviewDb to NoteDb and returning the result. |
| - `noteDb.changes.primaryStorage=NOTE_DB`: New changes are written only to |
| NoteDb, but changes whose primary storage is ReviewDb are still supported. |
| Continues to read from ReviewDb first as in the previous stage, but if the |
| change is not in ReviewDb, falls back to reading from NoteDb. + |
| Migration of existing changes is described in the link:#migration[Migration] |
| section below. + |
| Due to an implementation detail, writes to Changes or related tables still |
| result in write calls to the database layer, but they are inside a transaction |
| that is always rolled back. |
| - `noteDb.changes.disableReviewDb=true`: All access to Changes or related tables |
| is disabled; reads return no results, and writes are no-ops. Assumes the state |
| of all changes in NoteDb is accurate, and so is only safe once all changes are |
| NoteDb primary. Otherwise, reading changes only from NoteDb might result in |
| inaccurate results, and writing to NoteDb would compound the problem. + |
| Thus it is up to an admin of a previously-ReviewDb site to ensure |
| MigratePrimaryStorage has been run for all changes. Note that the current |
| implementation of the `rebuild-note-db` program does not do this. + |
| In this phase, it would be possible to delete the Changes tables out from |
| under a running server with no effect. |
| |
| [[migration]] |
| == Migration |
| |
| Once configuration options are set, migration to NoteDb is primarily |
| accomplished by running the `rebuild-note-db` program. Currently, this program |
| bulk copies ReviewDb data into NoteDb, but leaves primary storage of these |
| changes in ReviewDb, so the site is runnable with |
| `noteDb.changes.{write,read}=true`, but ReviewDb is still required. |
| |
| Eventually, `rebuild-note-db` will set primary storage to NoteDb for all |
| changes by default, so a site will be able to stop using ReviewDb for changes |
| immediately after a successful run. |
| |
| There is code in `PrimaryStorageMigrator.java` to migrate individual changes |
| from NoteDb primary to ReviewDb primary. This code is not intended to be used |
| except in the event of a critical bug in NoteDb primary changes in production. |
| It will likely never be used by `rebuild-note-db`, and in fact it's not |
| recommended to run `rebuild-note-db` until the code is stable enough that the |
| reverse migration won't be necessary. |
| |
| === Zero-Downtime Multi-Master Migration |
| |
| Single-master Gerrit sites can use `rebuild-note-db` on an offline site to |
| rebuild NoteDb, but this doesn't work in a zero-downtime environment like |
| googlesource.com. |
| |
| Here, the migration process looks like: |
| |
| - Turn on `noteDb.changes.write=true` to start writing to NoteDb. |
| - Run a parallel link:https://research.google.com/pubs/pub35650.html[FlumeJava] |
| pipeline to write NoteDb data for all changes, and update all `note_db_state` |
| fields. (Sorry, this implementation is entirely closed-source.) |
| - Turn on `noteDb.changes.read=true` to start reading from NoteDb. |
| - Turn on `noteDb.changes.primaryStorage=NOTE_DB` to start writing new changes |
| to NoteDb only. |
| - Run a Flume to migrate all existing changes to NoteDb primary. (Also |
| closed-source, but basically just a wrapper around `PrimaryStorageMigrator`.) |
| - Turn off access to ReviewDb changes tables. |