| ### What is notedb? |
| |
| Notedb is the successor to ReviewDb: a replacement database backend for Gerrit. |
| The goal is to read code review metadata from the same set of repositories that |
| store the data. This allows for improved atomicity, consistency of replication, |
| and the creation of new features like federated review and offline review. |
| |
| This document describes the state of migration (not necessarily completely up to |
| date), the tasks that remain, and notes on some of the challenges we've |
| encountered along the way. This document is **not** a full design document for |
| notedb; if you're that curious, bug dborowitz and he will help you out. |
| |
| Finally, this document is for core developers. If you are a casual user of |
| Gerrit looking for documentation, you've come to the wrong place. |
| |
| ## Root Tables |
| |
| While ReviewDb has a lot of tables, there are relatively few "root" tables, that |
| is, tables whose primary key's `get!ParentKey()` method returns null: |
| |
| * **Change**: subtables ChangeMessage, PatchSet, PatchSetApproval, |
| PatchSetAncestor, PatchLineComment, TrackingId |
| * **Account**: subtables AccountExternalId, AccountProjectWatch, |
| AccountSshKey, AccountDiffPreference, StarredChange |
| * **AccountGroup**: subtables AccountGroupMember |
| |
| TODO(dborowitz): Document other minor tables, audits, etc. |
| |
| For each root entity in each of these tables, there is one DAG describing all |
| modifications that have been applied to the change over time. Entities DAGs are |
| stored in a repository corresponding to their function: |
| |
| * **Change**: stored in `refs/changes/YZ/XYZ/meta` in the destination repository |
| * **Account**: stored in `refs/accounts/YZ/XYZ/meta` (TBD) in `All-Users` |
| * **AccountGroup**: stored in `TBD` in `All-Projects` |
| |
| Most of this document focuses on Change entities, partly because it's the most |
| complex, but also because that's where most effort to date has been focused. |
| |
| ## Changes: What's Done |
| |
| Current progress, along with some possibly-interesting implementation notes. |
| |
| * ChangeMessages: Stored in commit message body. Currently the subject of the |
| commit message contains a machine-generated-but-readable summary like |
| "Updated patch set 3"; we might decide to eliminate this and just use the |
| ChangeMessage. |
| * PatchSetApprovals: Stored as a footer "Label: Label-Name=Foo". Instead of |
| storing an implicit 0 review for reviewers, include them explicitly in a |
| separate Reviewer footer. Freeze labels at submit time along with the full |
| submit rule evaluator results using "Submitted-with". |
| * PatchLineComments: The only thing thus far actually stored in a note, on the |
| commit to which it applies. Drafts are stored in a per-user ref in |
| All-Users. |
| * TrackingId: Killed this long ago and use the secondary index. (Just reminded |
| myself I need to rip out the gwtorm entity.) |
| * Change: Started storing obvious fields: owner, project, branch, topic. |
| * PatchSet: Storing IDs and created on timestamps right now, not reading from |
| it yet. |
| |
| ## Changes: What Needs to Be Done |
| |
| * PatchSetAncestors: Replace with a persistent cache, which should probably be |
| rebuilt in RebuildNotedb. |
| * PatchSet: Draft field. (Someday I think we should replace Draft with WIP, |
| but I digress.) |
| * Change: Kill more fields. Actually implement reading from changes; see |
| challenges section. |
| * Some sort of batch parser. If we get 100 changes back from a search, |
| sequentially reading the DAG for each of those might take a while. |
| * Benchmark and optimize the heck out of the parsers. Let's say a target is |
| 1ms per change DAG. Related, we may also need to disable (eager) parsing of |
| certain fields, if we can prove with benchmarks that they are problematic. |
| (We already do this for PatchLineComments, to avoid having to read anything |
| but commits in the common case.) |
| |
| ## JGit Changes |
| |
| * Teach JGit to pack notedb refs in a separate pack file from everything else. |
| We don't want to hurt locality within the main pack by interleaving metadata |
| commits. |
| * Teach JGit to better handle many small, separate commit graphs in the |
| packer. Ordered chronologically, notedb metadata will be spread across a |
| large number of separate DAGs. We will get better kernel buffer cache |
| locality by clustering all commits in each disconnect DAG together. (But |
| this may also hurt batch parsing; see above.) |
| |
| ## Challenges |
| |
| TODO |