Expand documentation. Add design doc.

Change-Id: Ieafc33436f5de14aa59f440b63068e9844fd03ef
diff --git a/CONTRIBUTING b/CONTRIBUTING
index cd0c2db..655bb1f 100644
--- a/CONTRIBUTING
+++ b/CONTRIBUTING
@@ -1,4 +1,6 @@
-For contributions, we need a CL, and generally follow the process
-documented at
+For contributions, we need a change in Gerrit, and generally follow
+the process documented at
 
   https://gerrit-review.googlesource.com/Documentation/dev-contributing.html
+
+As a prerequisite, we will need also Google CLA on file.
diff --git a/README.md b/README.md
index daeccd4..b36c4f9 100644
--- a/README.md
+++ b/README.md
@@ -1,24 +1,35 @@
 
-This is a FUSE filesystem that provides light-weight, read-only checkouts of
-Android.
+This is a FUSE filesystem that provides light-weight, read-only checkouts of Git
+repositories. It is intended for use with Android.
 
 
 How to use
 ==========
 
-To start,
+To start the file system:
 
-    go install github.com/google/gitfs/cmd/gitfs-{multifs,expand-manifest}
-    gitfs-expand-manifest --gitiles https://android.googlesource.com/ \
-       > /tmp/m.xml
+    go install github.com/google/gitfs/cmd/gitfs-multifs
     mkdir /tmp/mnt
     gitfs-multifs -cache /tmp/cache -gitiles https://android.googlesource.com/  /tmp/mnt &
 
-then, in another terminal, execute
+To create a workspace "ws" corresponding to the latest manifest version
 
+    go install github.com/google/gitfs/cmd/gitfs-expand-manifest
+    gitfs-expand-manifest --gitiles https://android.googlesource.com/ \
+       > /tmp/m.xml &&
     ln -s /tmp/m.xml /tmp/mnt/config/ws
 
-To create a workspace "ws" corresponding to the manifest in m.xml.
+To populate a checkout
+
+    go install github.com/google/gitfs/cmd/gitfs-populate
+    mkdir -p checkout/frameworks
+    cd checkout/frameworks
+    git clone https://android.googlesource.com/platform/frameworks/base
+    cd ../
+    gitfs-populate -ro /tmp/mnt/ws .
+
+The filesystem daemon uses an on-disk cache, which by default is stored under
+~/.cache/gitfs
 
 
 Configuring
@@ -47,6 +58,7 @@
      {"File": ".*mk$", "Clone": true}]
 
 
+
 DISCLAIMER
 ==========
 
diff --git a/design.md b/design.md
new file mode 100644
index 0000000..cf51acb
--- /dev/null
+++ b/design.md
@@ -0,0 +1,166 @@
+
+Goal
+====
+Minimize source control overhead for Android developers
+
+
+Background
+==========
+
+Android uses git repositories stitched together with ‘repo’. There are several
+hundred repos, with new ones added very frequently. Managing these (syncing,
+checking out) is a significant waste of time. There are avoidable
+inefficiencies, because the tree contains a significant amount of unused data
+for unused host platforms, unused target architectures/devices and unused
+history.
+
+Requirements
+============
+
+* Released as Open source
+* Runs on Linux and Mac
+* Both for automated and manual use
+* Build and create CLs for the Android tree with minimal overhead
+* Easily deployable
+
+Idea
+====
+* Provide a lightweight mechanism to create a read-only snapshot of the Android tree.
+* Provide tooling to check out some repositories as git, and create a symlink
+  forest to the read-only snapshot for the rest.
+
+Implementation
+==============
+
+Overview:
+
+1. We provide a FUSE file system with R/O snapshots of the tree.
+2. We provide tooling to complete a partial checkout with symlinks to a R/O
+   snapshot.
+3. We provide tooling to sync partial checkout, updating timestamps to
+   satisfy the build system.
+
+FUSE filesystem overview
+------------------------
+
+Provide a FUSE file system for the R/O snapshot.
+
+   * The FS is read-only, except for timestamps.
+   * The FS uses hardlinks for blobs with the same content.
+   * The FS populates metadata from the Gitiles JSON REST API.
+   * When a file from some repo is opened for reading, fetch the content. A heuristic will decide between:
+      * Full clone (useful for code repositories)
+      * Shallow clone (prebuilts: history doesn’t matter, but need full tree)
+      * Individual object downloads from Gitiles (don’t need full tree)
+   * The FS runs as the user
+
+The FS can be configured by symlinking manifest.xml or a submodule commit SHA1
+   * ln -s ~/android/.repo/manifest.xml /fuse/config/WORKSPACE-NAME
+   * ln -s ~/submod-android:master /fuse/config/WORKSPACE-NAME
+
+Optional optimizations:
+
+   * Do periodic git pre-fetches from the FS; minimizes wait time when issuing the sync command.
+   * Initialize from an existing repo installation
+   * Create an on-disk Content Addressable Store for source files to minimize time to unpack data on startup.
+
+
+Tooling overview
+----------------
+* Provide tooling to calculate diff between 2 trees. Based on this diff, call “touch” on the changed blobs.
+* Provide tooling to populate a workspace with symlink forest pointing to the FUSE snapshot, and checking out individual subrepos as normal .git repositories.
+
+
+Motivation
+==========
+
+Why OSX
+-------
+
+This is a request from Android team. It actually makes providing a good solution
+more difficult, because OSX and OSXFUSE lack several features:
+
+   * No attribute cache control
+   * No kernel UnionFS or bind mounts.
+   * OSXFUSE is buggy
+   * OSXFUSE lacks performance optimizations (eg. readdirplus, zero roundtrip reads)
+
+Why FUSE?
+---------
+
+   * FUSE is the only way to avoid downloading data for unused files
+   * You can also create trees cheaply by hard/soft linking a CAS directly. However, it is easy for users to accidentally edit files in the CAS, leading to build breakage.
+   * This could be circumvented by asking users to run the CAS under a different UID, but that is bothersome to set up.
+
+Why readonly?
+-------------
+
+Write access goes through the normal file system.  This avoids surfacing a
+writable tree in FUSE.
+
+   * Git performance would suffer if routed through FUSE
+   * A writable git repo would have to be backed by some data on disk. Using a normal git repo is easiest for troubleshooting and for users to understand, but at the same time, the standard posix interface (which uses filenames) is ill-suited to implement a posixly correct file system. We sidestep this problem by not offering a writable tree.
+   * Preventing writes prevents FS race conditions.
+   * For a r/o FS, we can set infinite timeouts on attributes and entry data,
+     minimizing kernel roundtrips.
+
+Why writable timestamps?
+------------------------
+
+We must support incremental builds, so syncs must lead to timestamp changes.
+
+   * In OSXFUSE, the FS can’t invalidate attributes, so it is better to change timestamps from the outside.
+
+Why hardlinks for blobs?
+------------------------
+
+   * multiple checkouts of similar trees can share kernel page cache memory for
+     the trees.
+   * reading through FUSE is expensive. Sharing the blobs amortizes reading costs.
+   * disadvantage:  blobs are shared, so when setting timestamps for one
+     workspace, other workspaces are affected too, leading to spurious rebuilds
+     in those workspaces.
+
+Why run FUSE as the user?
+-------------------------
+
+   * Simplifies credential management, in case the FS must contact authenticated services.
+   * The FS doesn’t have to reason about the permissions and owners of the process opening some file.
+   * Simplifies deployment. User does not need root privileges to use this.
+
+
+Why use gitiles JSON API, and git wire protocol?
+------------------------------------------------
+
+   * Gitiles + git wire protocol are already supported in open-source Gerrit. Zero deployment overhead for external users.
+   * Git wire protocol is battle tested and well optimized for bulk transfers.
+   * The gitiles JSON API is sufficient for what we want.
+
+Open questions
+==============
+
+* How should we integrate this into existing tooling? (repo?)
+* Does ‘repo’ get confused when part of the checkout is a symlink forest?
+* Do we replace (reimplement) repo, or extend it?
+
+Implementation steps
+====================
+
+* Gitiles:
+   * Add support for ‘size’ field to tree listings. The ‘size’ field is necessary for serving FUSE data.
+   * Add support for recursive tree traversals. This saves roundtrips.
+
+* FUSE daemon:
+   * Add support for serving trees based on Gitiles JSON data
+   * Add support for lazily cloning .git repositories in gitfs
+   * Add support for chtimes
+   * Add support for sharing blobs
+   * Add support for a CAS
+   * Add support for surfacing manifest.xml with SHA1s.
+   * Investigate bazil.org/fuse, which has better community support.
+
+* Tooling:
+   * Generate composite workspace of symlink forest and git repos
+   * Generate chtime() calls based on two manifest.xml (with SHA1s) or submodule SHA1s.
+   * Write a sync command
+   * Add ‘checkout’ command