Multiplier Efficiency of Modular Computation
There are many advantages to modular computation, but one advantage has been under-appreciated for far too long, and that is the issue of multiplier efficiency!
Multipliers are complex, and there are many methods to multiply two values, and let’s face it, most people don’t care much about these things. But if you’re an FPGA developer, or even an ASIC or custom IC developer, then the area consumed by your multipliers is of significant importance, especially if you plan to pack as many multipliers one a single die as possible!
But how can one compare the cost in terms of resources required by a large multiplier circuit? One way is to compile multipliers of various sizes using an FPGA device. The best FPGA devices for this analysis are the early generation FPGA devices, such as Cyclone-IV or Stratix-IV from Intel. The reason is these FPGA’s have small ‘unit multipliers’, namely 9×9 and 18×18 multipliers that are connected together to form larger multipliers. (The later FPGA devices have hidden complexity in their DSP blocks, and therefore counting resources is not as easy.)
In fact, it becomes apparent the number of unit multipliers required grows as the square of the precision for a binary multiplier. For example, a single 9-bit unit multiplier is required for a 9-bit binary multiplier, but for an 18-bit binary multiplier, four 9-bit unit multipliers are required. For a 36-bit binary multiplier, as many as sixteen 9-bit unit multipliers are required. The reason is the number of partial products that must be calculated grows as the square of the precision. But in a residue multiplier, partial products are not generated since there is no carry from digit to digit. Therefore, multiplier utilization is nearly linear with respect to the precision of the final multiplier.
For example, an 8-bit modular multiplier will require two 9-bit multipliers, a 16-bit modular multiplier will require four 9-bit multipliers, and a 32-bit modular multiplier will require eight 9-bit multipliers. One can see the resource growth of unit multipliers for the modular multiplier increase in a linear fashion with respect to bit width.
There are some subtle differences we won’t go into here, but the allocation of multiplier resources when using modular arithmetic is basically linear (n), but with binary multipliers, it’s O(n2). This fact implies the power consumption inherent in residue processing is significantly less, and that many more modular multipliers versus binary multipliers may be integrated on a single die!
There’s always more to the story when it comes to modular arithmetic!