tree 86de8fa87c75f114780a6d69415742dd9dfc2869
parent 94a465e0989ff8124aca3dca8e200aeb870cc9dd
author Martin Fick <mfick@codeaurora.org> 1571093855 -0600
committer Martin Fick <mfick@codeaurora.org> 1572034535 -0600

Fix potential loss of persisted replication task

There was a race window between the check to see if any new updates were
available for the completed update and the deletion of the persistent
update. If an update occurred in that window it could leave the event
missing from the persisted task store. If the server went down before
this update completed, the update would be missed entirely.

Eliminate the race by separating the running tasks from the waiting
tasks in the persistent store by placing each into their own
subdirectories. Place new waiting tasks in a "waiting" directory and
move them to the "running" directory once they are running. This allows
new updates to be persisted to the "waiting" directory while a similar
update is running without the waiting task getting deleted when the
persisted running task is deleted. On startup, reset all running tasks
by moving them back to the waiting directory to ensure that they are
retried.

Reset "running" tasks (return them to the "waiting" directory) when a
retry is rescheduled, this allows the retry to be consolidated with the
new run, and helps ensure that the persistence store reflects what is
actually happening better.

Bug: Issue 11672
Change-Id: Ia31329e8d939f8e5cb1e7455de69744431f34d66
