Calsoft Whitepaper - A Comparison of Two Approaches to Build Reliable Distributed File Server

Several existing distributed file systems provide reliability by server replication. The servers implement protocols that unsure the consistency and coherence of the replicas. There are several advantages to this approach: fast recovery from failure, flexibility, and ability to distribute file reads among replicas, and resistance to site disaster. These advantages come at the expense of an increase in network and CPU loads. In addition, re-integrating an obsolete replica into the system may be expensive.

An alternative approach is to use dual-ported disks accessible to a server and a backup. The server stores information about its volatile mate on a disk log. The backup does not access the disks or maintain information about the server’s internal state during normal operation. If the saver fails, the backup will reconstruct the server’s lost state using the log and will impersonate the server in addition to maintaining its own identity. Because of impersonation, the recovery from failure is transparent: clients only experience a short period of unavailability while the backup is reconstructing the mate of the failed server. Since the backup is itself a server of another file system, operation will continue with a potential reduction in performance.

This paper compares the two approaches. Specifically, the comparison addresses the issues of availability, overhead during normal operation, flexibility and cost of reintegrating failed servers after recovery. We compare the two approaches at an abstract level, by comparing the concepts and trade-offs that drive each, and at a concrete level, by comparing an implementation of each. The concrete comparison highlights the issues that affect the design and implementation of a reliable file server in practice.

  • Introduction
  • Deceit
  • HA-NFS
  • Performance Comparison
  • Availability
  • Other Comparison Issues

