blob: 493d74db8e9de2f0b15e96e178403b986ae51f5f [file] [log] [blame] [view]
### What is notedb?
Notedb is the successor to ReviewDb: a replacement database backend for Gerrit.
The goal is to read code review metadata from the same set of repositories that
store the data. This allows for improved atomicity, consistency of replication,
and the creation of new features like federated review and offline review.
This document describes the state of migration (not necessarily completely up to
date), the tasks that remain, and notes on some of the challenges we've
encountered along the way. This document is **not** a full design document for
notedb; if you're that curious, bug dborowitz and he will help you out.
Finally, this document is for core developers. If you are a casual user of
Gerrit looking for documentation, you've come to the wrong place.
## Root Tables
While ReviewDb has a lot of tables, there are relatively few "root" tables, that
is, tables whose primary key's `get!ParentKey()` method returns null:
* **Change**: subtables ChangeMessage, PatchSet, PatchSetApproval,
PatchSetAncestor, PatchLineComment, TrackingId
* **Account**: subtables AccountExternalId, AccountProjectWatch,
AccountSshKey, AccountDiffPreference, StarredChange
* **AccountGroup**: subtables AccountGroupMember
TODO(dborowitz): Document other minor tables, audits, etc.
For each root entity in each of these tables, there is one DAG describing all
modifications that have been applied to the change over time. Entities DAGs are
stored in a repository corresponding to their function:
* **Change**: stored in `refs/changes/YZ/XYZ/meta` in the destination repository
* **Account**: stored in `refs/accounts/YZ/XYZ/meta` (TBD) in `All-Users`
* **AccountGroup**: stored in `TBD` in `All-Projects`
Most of this document focuses on Change entities, partly because it's the most
complex, but also because that's where most effort to date has been focused.
## Changes: What's Done
Current progress, along with some possibly-interesting implementation notes.
* ChangeMessages: Stored in commit message body. Currently the subject of the
commit message contains a machine-generated-but-readable summary like
"Updated patch set 3"; we might decide to eliminate this and just use the
ChangeMessage.
* PatchSetApprovals: Stored as a footer "Label: Label-Name=Foo". Instead of
storing an implicit 0 review for reviewers, include them explicitly in a
separate Reviewer footer. Freeze labels at submit time along with the full
submit rule evaluator results using "Submitted-with".
* PatchLineComments: The only thing thus far actually stored in a note, on the
commit to which it applies. Drafts are stored in a per-user ref in
All-Users.
* TrackingId: Killed this long ago and use the secondary index. (Just reminded
myself I need to rip out the gwtorm entity.)
* Change: Started storing obvious fields: owner, project, branch, topic.
* PatchSet: Storing IDs and created on timestamps right now, not reading from
it yet.
## Changes: What Needs to Be Done
* PatchSetAncestors: Replace with a persistent cache, which should probably be
rebuilt in RebuildNotedb.
* PatchSet: Draft field. (Someday I think we should replace Draft with WIP,
but I digress.)
* Change: Kill more fields. Actually implement reading from changes; see
challenges section.
* Some sort of batch parser. If we get 100 changes back from a search,
sequentially reading the DAG for each of those might take a while.
* Benchmark and optimize the heck out of the parsers. Let's say a target is
1ms per change DAG. Related, we may also need to disable (eager) parsing of
certain fields, if we can prove with benchmarks that they are problematic.
(We already do this for PatchLineComments, to avoid having to read anything
but commits in the common case.)
## JGit Changes
* Teach JGit to pack notedb refs in a separate pack file from everything else.
We don't want to hurt locality within the main pack by interleaving metadata
commits.
* Teach JGit to better handle many small, separate commit graphs in the
packer. Ordered chronologically, notedb metadata will be spread across a
large number of separate DAGs. We will get better kernel buffer cache
locality by clustering all commits in each disconnect DAG together. (But
this may also hurt batch parsing; see above.)
## Challenges
TODO