Artificial Intelligence – the killer app
Artificial Intelligence (AI) is quite a trendy buzz word these days, and yet most of us don’t really know exactly what it is. In reality, AI is a collection of algorithms and data sets which mimic human intelligence, at least in a limited scope in most cases. While there are many algorithms in AI, neural network algorithms have emerged to the forefront as they have shown significant ability in tackling problems that were difficult for prior art algorithms. Neural network algorithms mimic the human brain by simulating multiple layers of neurons which have been previously trained to recognize features and other aspects of large data sets, such as images and voice waveforms.
The basic idea of the neural network is that once trained, and once a data stimulus is provided, the neural network will respond and output a corresponding result. If everything goes as planned, the output will reflect the proper answer as expected based on the training experience provided to the network. If we look closer at the algorithms which implement neural network simulation, especially of the inference or retrieval stages of AI, we find that much of the workload is matrix and vector operations. Moreover, we also find that many of the low-level calculations are summations and weighted product summations.
When the size of the data sets is very large, the corresponding number of neurons is also large in a typical neural network. As of this date, simulated neural networks employing over 100 billion neurons have been developed. As the number of neurons grows, the number of connections between neurons also grows, so the number of calculations grows very quickly. If the neural network is to be operated in a real-time mode, then response time is paramount. In these cases, specialized hardware designed to perform the required calculations is generally required.
But other problems exist, including the amount of power required to operate large data centers which employ AI to process incoming voice streams and video images. The processing load is staggering, and by any estimate will continue to grow and outpace available solutions for years to come; this fact has prompted large companies like Google to develop their own AI hardware and data infrastructure. So naturally a need arises to process enormous amounts of calculations, mostly product summations, as efficiently and as quickly as possible. This is where the technology of MaiTRIX comes into play.
Because MaiTRIX has pioneered so many new techniques enabling general purpose computation in residues, including RNS fractional representations and their normalization, the long-sought ability to break-up large word-size summations into a series of smaller summations, without carry, is now possible. AI algorithms represent a near perfect case for modular computation, since the need to normalize may often be delayed so that product summation may continue without carry, and each digit summation may be performed in tandem. The residue calculations are therefore very precise, since the summation is performed in an extended word format, but without carry from digit to digit. This means the RNS TPU is incredibly fast, efficient and accurate.
The RNS TPU is also capable of iterative operation; that is, the resulting output matrix may be fed back as an argument for the next iteration, for example. In AI, the output result is further processed by an activation function, and this result is fed back into the matrix multiplier. This ability implies general purpose processing is achieved in the new RNS arithmetic, i.e., it’s a form of multiplicative iteration, i.e., using the output to feed the input. That’s not trivial in a number system like RNS. Re-using RNS results reduces the need for data conversion, and allows highly accurate results to remain in the RNS domain until they are needed. When matrix results are complete, they are converted to floating-point format and processed by conventional or high-performance binary processors.
While the basics of matrix multiplication of both fixed-point and complex fixed-point formats is now straight-forward in RNS, future research will be focused on developing pipelined activation and pooling functions to enhance the ability of the RNS TPU to perform neural network retrieval processing. Moreover, the RNS matrix multiplier may find application in many other areas, including the processing of matrix algorithms such as Google’s page-rank algorithm.