Distributed Node Migration by Effective Fault Tolerance

This project proposes an adaptive programming model for fault-tolerance distributed computing, which provides upper-layer applications with process state information according to the current system synchrony (or QoS). The synchronous distributed computing model provides processes with bounds on processing time and message transfer delay.

These bounds, explicitly known by the processes, can be used to safely detect process crashes and, consequently, allow the noncrashed processes to progress with safe views of the system state. Synchronous systems are attractive because they allow system designers to solve many problems. The price that has to be paid is the a priori knowledge on time bounds. If they are violated, the upper-layer protocols may be unable to still guarantee their safety property.