Fault Tolerance


Upon system power failure, the RMS-250 switches to an auxiliary power mode provided by on-board ultracapacitors and data that is stored in volatile DRAM is transferred to persistent NAND memory by the Flush-to-Flash firmware.  Once transferred to NAND memory, data is stored in the persistent storage and not vulnerable to the 72-hour limitations common to battery-based architectures that hold data in volatile DRAM in a self-refresh mode.

Radian’s Flush-to-Flash firmware is based on transactional semantics to ensure the utmost in data integrity even in the event of failures that could occur during the flush process.  Extensive monitoring and component checks are performed on an on-going basis during normal operations to discover predictive anomalies in advance of failures.  NAND Flash memory is regularly scanned for potential errors (bad blocks) and ultracapacitor health is monitored on a continual basis.

However, in the event of a failure during the flush process, such as a lack of power required to perform a complete data transfer, the Flush-to-Flash system ensures that partial data is properly transferred and can be identified accordingly upon restore.  A hardware ECC engine in the controller provides error correction functionality and, combined with the firmware implementation, protects data against NAND page or block errors.  Extensive use of metadata and error checking is performed on all data upon restore to ensure correctness.

The overall Flush-to-Flash system and underlying NAND array are based on a fault tolerant architecture, including overprovisioning resources such as ultracapacitor power and  NAND capacity, to address events such as repeated system power blackouts and brownouts.  The architecture and design verification test processes further address these conditions in the context of operations such as concurrent host atomic writes, providing the highest levels of enterprise reliability.