48ff6f398941572243a0fd7aa26e07d6c95696a1 - plugins/replication

commit	48ff6f398941572243a0fd7aa26e07d6c95696a1	[log] [tgz]
author	Martin Fick <mfick@codeaurora.org>	Mon Oct 14 16:57:35 2019 -0600
committer	Martin Fick <mfick@codeaurora.org>	Fri Oct 25 14:15:35 2019 -0600
tree	86de8fa87c75f114780a6d69415742dd9dfc2869
parent	94a465e0989ff8124aca3dca8e200aeb870cc9dd [diff]

Fix potential loss of persisted replication task

There was a race window between the check to see if any new updates were
available for the completed update and the deletion of the persistent
update. If an update occurred in that window it could leave the event
missing from the persisted task store. If the server went down before
this update completed, the update would be missed entirely.

Eliminate the race by separating the running tasks from the waiting
tasks in the persistent store by placing each into their own
subdirectories. Place new waiting tasks in a "waiting" directory and
move them to the "running" directory once they are running. This allows
new updates to be persisted to the "waiting" directory while a similar
update is running without the waiting task getting deleted when the
persisted running task is deleted. On startup, reset all running tasks
by moving them back to the waiting directory to ensure that they are
retried.

Reset "running" tasks (return them to the "waiting" directory) when a
retry is rescheduled, this allows the retry to be consolidated with the
new run, and helps ensure that the persistence store reflects what is
actually happening better.

Bug: Issue 11672
Change-Id: Ia31329e8d939f8e5cb1e7455de69744431f34d66

5 files changed

tree: 86de8fa87c75f114780a6d69415742dd9dfc2869