RNS Tensor Processor Unit (RNS TPU)

Our RNS TPU performs matrix multiplication of 32.32 fixed-point arithmetic 7 to 9 times more efficiently than a binary matrix multiplier and will achieve even higher speeds and efficiencies at lower precision, such as the precision adequate for convolutional neural network applications. 

The RNS based matrix multiplier

MaiTRIX’s RNS-TPU matrix multiplier technology provides the most accurate and most efficient processing of matrix multiplication in the world. Even advanced double-precision floating point hardware cannot emulate the high accuracy, high speed and low power capabilities of our technology. 

Why? Because all calculations are performed in extended precision, no numeric errors are introduced until a final normalization; thus, rounding errors are held to a single rounding unit for all dot products.  

Combine this fact with our unique carry free operation and it’s no wonder our technology is a significant breakthrough in computation!  Yet the RNS-TPU is simple to use; it can be configured to use single or double-precision floating operands as input, and single or double precision floating-point as output.

 

Technologies for implementation

The RNS-TPU can be immediately deployed using high-end FPGA devices providing significant performance increase for hardware matrix multiplication operations.  MaiTRIX’s IP is incredibly flexible and scalable, providing even higher relative throughput when synthesized using conventional ASIC or custom IC technologies.  Key parameters of the RNS TPU can be adjusted; for example RNS digit-width may be adjusted to support hard or soft multipliers supporting operands from 7-bits wide to 18-bits wide and while retaining extreme numeric precision! 

For off the shelf solutions, our RNS-TPU can be integrated into the Intel Arria 10 or Stratix 10 FPGA development boards from Intel and Terasic.  MaiTRIX also supports high-end FPGA devices from Xilinx, including the Virtex and Kintex device families.  MaiTRIX supports many off-the-shelf accelerator cards from Intel and Xilinx to provide a wide range of solutions for applications that demand high capacity processing, high bandwidth memory and high-speed communication interfaces.

Applications for the RNS TPU

The RNS TPU can help accelerate, increase accuracy and reduce power consumption for convolutional neural networks, convolutional image processing, high-speed radar applications, autonomous vehicles, space-based applications, scientific processing applications, weather and turbulence modeling, and many, more applications!  For deep space and other high-reliability applications requiring an UN-unprecedented level of error correction of arithmetic, see our TPU-ec !

 

Because of the carry-free nature of modular computation, our technology is ideally primed for advanced implementation into 3-D IC technologies and advanced quantum computing applications!  For extreme reliability applications, such as deep space exploration, please see our error correcting TPU-ec !  These technologies may be leveraged or simulated for development of advanced hybrid computers of the future!

 

RNS TPU cloud access:  Coming soon!

Access our public RNS-TPU research papers

Access our RNS-TPU via the cloud!

Preliminary Specifications for Arria 10 based RNS TPU 1.0

See Erica, our Electronic Residue Integration and Computation Accelerator!

*The RNS matrix multiplier and RNS-TPU are inventions of MaiTRIX LLC and are protected by the following US patents #10,387,122 and other patents pending in the US and abroad;

* RNS-TPU is a trademark of MaiTRIX, LLC