org.eclipse.jgit.storage.dht/README - jgit - Git at Google

 JGit Storage on DHT
 -------------------

 This implementation still has some pending issues:

 * DhtInserter must skip existing objects

   DirCache writes all trees to the ObjectInserter, letting the
   inserter figure out which trees we already have, and which are new.
   DhtInserter should buffer trees into a chunk, then before writing
   the chunk to the DHT do a batch lookup to find the existing
   ObjectInfo (if any).  If any exist, the chunk should be compacted to
   eliminate these objects, and if there is room in the chunk for more
   objects, it should go back to the DhtInserter to be filled further
   before flushing.

   This implies the DhtInserter needs to work on multiple chunks at
   once, and may need to combine chunks together when there is more
   than one partial chunk.

 * DhtPackParser must check for collisions

   Because ChunkCache blindly assumes any copy of an object is an OK
   copy of an object, DhtPackParser needs to validate all new objects
   at the end of its importing phase, before it links the objects into
   the ObjectIndexTable.  Most objects won't already exist, but some
   may, and those that do must either be removed from their chunk, or
   have their content byte-for-byte validated.

   Removal from a chunk just means deleting it from the chunk's local
   index, and not writing it to the global ObjectIndexTable.  This
   creates a hole in the chunk which is wasted space, and that isn't
   very useful.  Fortunately objects that fit fully within one chunk
   may be easy to inflate and double check, as they are small.  Objects
   that are big span multiple chunks, and the new chunks can simply be
   deleted from the ChunkTable, leaving the original chunks.

   Deltas can be checked quickly by inflating the delta and checking
   only the insertion point text, comparing that to the existing data
   in the repository.  Unfortunately the repository is likely to use a
   different delta representation, which means at least one of them
   will need to be fully inflated to check the delta against.

 * DhtPackParser should handle small-huge-small-huge

   Multiple chunks need to be open at once, in case we get a bad
   pattern of small-object, huge-object, small-object, huge-object.  In
   this case the small-objects should be put together into the same
   chunk, to prevent having too many tiny chunks.  This is tricky to do
   with OFS_DELTA.  A long OFS_DELTA requires all prior chunks to be
   closed out so we know their lengths.

 * RepresentationSelector performance bad on Cassandra

   The 1.8 million batch lookups done for linux-2.6 kills Cassandra, it
   cannot handle this read load.

 * READ_REPAIR isn't fully accurate

   There are a lot of places where the generic DHT code should be
   helping to validate the local replica is consistent, and where it is
   not, help the underlying storage system to heal the local replica by
   reading from a remote replica and putting it back to the local one.
   Most of this should be handled in the DHT SPI layer, but the generic
   DHT code should be giving better hints during get() method calls.

 * LOCAL / WORLD writes

   Many writes should be done locally first, before they replicate to
   the other replicas, as they might be backed out on an abort.

   Likewise some writes must take place across sufficient replicas to
   ensure the write is not lost... and this may include ensuring that
   earlier local-only writes have actually been committed to all
   replicas.  This committing to replicas might be happening in the
   background automatically after the local write (e.g. Cassandra will
   start to send writes made by one node to other nodes, but doesn't
   promise they finish).  But parts of the code may need to force this
   replication to complete before the higher level git operation ends.

 * Forks/alternates

   Forking is common, but we should avoid duplicating content into the
   fork if the base repository has it.  This requires some sort of
   change to the key structure so that chunks are owned by an object
   pool, and the object pool owns the repositories that use it.  GC
   proceeds at the object pool level, rather than the repository level,
   but might want to take some of the reference namespace into account
   to avoid placing forked less-common content near primary content.
	JGit Storage on DHT
	-------------------

	This implementation still has some pending issues:

	* DhtInserter must skip existing objects

	DirCache writes all trees to the ObjectInserter, letting the
	inserter figure out which trees we already have, and which are new.
	DhtInserter should buffer trees into a chunk, then before writing
	the chunk to the DHT do a batch lookup to find the existing
	ObjectInfo (if any). If any exist, the chunk should be compacted to
	eliminate these objects, and if there is room in the chunk for more
	objects, it should go back to the DhtInserter to be filled further
	before flushing.

	This implies the DhtInserter needs to work on multiple chunks at
	once, and may need to combine chunks together when there is more
	than one partial chunk.

	* DhtPackParser must check for collisions

	Because ChunkCache blindly assumes any copy of an object is an OK
	copy of an object, DhtPackParser needs to validate all new objects
	at the end of its importing phase, before it links the objects into
	the ObjectIndexTable. Most objects won't already exist, but some
	may, and those that do must either be removed from their chunk, or
	have their content byte-for-byte validated.

	Removal from a chunk just means deleting it from the chunk's local
	index, and not writing it to the global ObjectIndexTable. This
	creates a hole in the chunk which is wasted space, and that isn't
	very useful. Fortunately objects that fit fully within one chunk
	may be easy to inflate and double check, as they are small. Objects
	that are big span multiple chunks, and the new chunks can simply be
	deleted from the ChunkTable, leaving the original chunks.

	Deltas can be checked quickly by inflating the delta and checking
	only the insertion point text, comparing that to the existing data
	in the repository. Unfortunately the repository is likely to use a
	different delta representation, which means at least one of them
	will need to be fully inflated to check the delta against.

	* DhtPackParser should handle small-huge-small-huge

	Multiple chunks need to be open at once, in case we get a bad
	pattern of small-object, huge-object, small-object, huge-object. In
	this case the small-objects should be put together into the same
	chunk, to prevent having too many tiny chunks. This is tricky to do
	with OFS_DELTA. A long OFS_DELTA requires all prior chunks to be
	closed out so we know their lengths.

	* RepresentationSelector performance bad on Cassandra

	The 1.8 million batch lookups done for linux-2.6 kills Cassandra, it
	cannot handle this read load.

	* READ_REPAIR isn't fully accurate

	There are a lot of places where the generic DHT code should be
	helping to validate the local replica is consistent, and where it is
	not, help the underlying storage system to heal the local replica by
	reading from a remote replica and putting it back to the local one.
	Most of this should be handled in the DHT SPI layer, but the generic
	DHT code should be giving better hints during get() method calls.

	* LOCAL / WORLD writes

	Many writes should be done locally first, before they replicate to
	the other replicas, as they might be backed out on an abort.

	Likewise some writes must take place across sufficient replicas to
	ensure the write is not lost... and this may include ensuring that
	earlier local-only writes have actually been committed to all
	replicas. This committing to replicas might be happening in the
	background automatically after the local write (e.g. Cassandra will
	start to send writes made by one node to other nodes, but doesn't
	promise they finish). But parts of the code may need to force this
	replication to complete before the higher level git operation ends.

	* Forks/alternates

	Forking is common, but we should avoid duplicating content into the
	fork if the base repository has it. This requires some sort of
	change to the key structure so that chunks are owned by an object
	pool, and the object pool owns the repositories that use it. GC
	proceeds at the object pool level, rather than the repository level,
	but might want to take some of the reference namespace into account
	to avoid placing forked less-common content near primary content.