EC-TPU: Continuously Corrected Arithmetic
MaiTRIX is proud to announce new breakthroughs in the detection and correction of arithmetic errors! MaiTRIX’s advanced error-correcting Tensor Processor Unit (EC-TPU) is the first in a series of high-performance AI processors which will be offered with our advanced error correcting capability. MaiTRIX expects prototypes to be released for IP testing and evaluation soon.
The EC-TPU combines the power of our standard TPU product with the unprecedented capability to detect and correct errors that have occurred in complex arithmetic routines, such as matrix multiplication! The advanced error correction occurs seamlessly and continuously and without incurring additional delay when an arithmetic result is corrected. No need to reset the processor when an arithmetic error is detected, and there is normally no need to re-synchronize the processor during error events.
The combination of MaiTRIX’s TPU processor with its advanced error detection and correction IP work together in unexpected ways to decrease vulnerability to many single and multiple events that would otherwise cripple other processors! The EC-TPU is capable of detecting and correcting faulty arithmetic due to clocking errors, high energy neutron strikes, ionizing gamma radiation, meta-stability of CMOS logic, transient and static configuration RAM errors in FPGAs knowns as SEUs and many other sources of single and multiple error events.
MaiTRIX’s TPU-EC paves the way to fault tolerant AI computation in space
Classical arithmetic error detection schemes such as a triplicate modular redundancy (TMR) require more than 300% resources versus a single uncorrected ALU. MaiTRIX’s new error detection and correction technology only requires 20% more resources than a single ALU without correction. Even still, the new techniques in error correction of RNS arithmetic are far more effective, as they will reject many more multiple errors at once, an impossible feat for triplication schemes that struggle with more than a single error event at once.
The EC-TPU targets AI processor applications and other high performance matrix operations for extreme performance and high-reliability applications, such as space-based satellites, Space Force avionics and Deep Space AI.