Modular Multiplication Without Overflow
May 28, 2013
We discussed modular arithmetic in a previous exercise. In a comment on that exercise, Matthew asked what you can do when you have a fixed size integer datatype (say 64-bit unsigned integers) and a modulus that is close to the maximum. That’s a good question, which we will answer today. The answer is straight forward if the modulus is at least one bit less than the maximum, and rather trickier if it’s not. We’ll look at multiplication, but the other operations are all similar.
As long as the modulus is at least one bit less than the maximum, the solution is to split the numbers in low-bit and high-bit halves, then perform the arithmetic piecemeal, kind of like grade-school multiplication where you multiply by the one’s digit, then shift the sum and multiply by the ten’s digit, and so on, except that the “digits” are the size of the square root of the maximum number than can be represented in the integer datatype.
If m is within one bit of the maximum for the integer data type, things get messier; the basic answer is to split into three parts, compute the intermediate products, and recombine. But we won’t bother with that.
Your task is to write a function that performs modular multiplication without overflow. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
This seems to work for arbitrary 64-bit values, and for me it runs a little faster than splitting into two 32-bit parts. If you have a recent gcc, you can cheat with:
On x86_64 there is also 4 lines of asm that will do it quickly. There are faster methods if one has lots of operations with the same modulus.
I see that Mathew’s question was about addition. With unsigned types in C I use:
to return (a + b) mod n. This requires that a and b are already modulo n, hence n > a,b (if you don’t know, just mod them. but often you know they’re in range). If n-a > b, then there won’t be overflow so we just add; otherwise we do a+b-n. Since this is an unsigned type, it wraps ok around the max integer size, and because a and b were modulo n when we started, one subtract will be enough.
On my previous post, the “UV” in the mulmod macro should be “unsigned long”. The comment about removing the mods is indicating performance — they don’t hurt anything other than wasting time if you already know the values are mod m.
FORTH is tricky since the support for 64 bit math in general is very limited. This is specific to x86 with 80 bit IEEE floating point math. OTOH, 32 bit modular multiplication can be done easily without overflow due to words that generate 64 bit intermediate result.