期刊名称:International Journal of Reconfigurable Computing
印刷版ISSN:1687-7195
电子版ISSN:1687-7209
出版年度:2011
卷号:2011
DOI:10.1155/2011/962062
出版社:Hindawi Publishing Corporation
摘要:We introduce a specialized self-checking hardware journal being used as a
centerpiece in our design strategy to build a processor tolerant to transient
faults. Fault tolerance here relies on the use of error detection techniques in
the processor core together with journalization and rollback execution to
recover from erroneous situations. Effective rollback recovery is possible
thanks to using a hardware journal and chosing a stack computing architecture
for the processor core instead of the usual RISC or CISC. The main objective of
the journalization and the hardware self-checking journal is to prevent data not
yet validated to be sent to the main memory, and allow to fast rollback
execution on faulty situations. The main memory, supposed to be fault secure in
our model, only contains valid (uncorrupted) data obtained from fault-free
computations. Error control coding techniques are used both in the processor
core to detect errors and in the HW journal to protect the temporarily stored
data from possible changes induced by transient faults. Implementation results
on an FPGA of the Altera Stratix-II family show clearly the relevance of the
approach, both in terms of performance/area tradeoff and fault tolerance
effectiveness, even for high error rates.