Modular Multiplication Without Overflow

May 28, 2013

We discussed modular arithmetic in a previous exercise. In a comment on that exercise, Matthew asked what you can do when you have a fixed size integer datatype (say 64-bit unsigned integers) and a modulus that is close to the maximum. That’s a good question, which we will answer today. The answer is straight forward if the modulus is at least one bit less than the maximum, and rather trickier if it’s not. We’ll look at multiplication, but the other operations are all similar.

As long as the modulus is at least one bit less than the maximum, the solution is to split the numbers in low-bit and high-bit halves, then perform the arithmetic piecemeal, kind of like grade-school multiplication where you multiply by the one’s digit, then shift the sum and multiply by the ten’s digit, and so on, except that the “digits” are the size of the square root of the maximum number than can be represented in the integer datatype.

If m is within one bit of the maximum for the integer data type, things get messier; the basic answer is to split into three parts, compute the intermediate products, and recombine. But we won’t bother with that.

Your task is to write a function that performs modular multiplication without overflow. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

3 Comments »

3 Responses to “Modular Multiplication Without Overflow”

danaj said

May 29, 2013 at 5:20 AM

unsigned long mulmod(unsigned long a, unsigned long b, unsigned long m) {
    unsigned long r = 0;
    /* Remove these mods if the caller can ensure a and b are in range */
    a %= m;
    b %= m;
    while (b > 0) {
        if (b & 1)  r = ((m-r) > a) ? r+a : r+a-m;    /* r = (r + a) % m */
        b >>= 1;
        if (b)      a = ((m-a) > a) ? a+a : a+a-m;    /* a = (a + a) % m */
    }
    return r;
}

This seems to work for arbitrary 64-bit values, and for me it runs a little faster than splitting into two 32-bit parts. If you have a recent gcc, you can cheat with:

#define mulmod(a,b,m)  (UV)(((__uint128_t)(a)*(__uint128_t)(b)) % ((__uint128_t)(m)))

On x86_64 there is also 4 lines of asm that will do it quickly. There are faster methods if one has lots of operations with the same modulus.

danaj said
May 30, 2013 at 6:08 AM
I see that Mathew’s question was about addition. With unsigned types in C I use:
```
#define addmod(a,b,n) ((((n)-(a)) > (b))  ?  ((a)+(b))  :  ((a)+(b)-(n)))
```
to return (a + b) mod n. This requires that a and b are already modulo n, hence n > a,b (if you don’t know, just mod them. but often you know they’re in range). If n-a > b, then there won’t be overflow so we just add; otherwise we do a+b-n. Since this is an unsigned type, it wraps ok around the max integer size, and because a and b were modulo n when we started, one subtract will be enough.

On my previous post, the “UV” in the mulmod macro should be “unsigned long”. The comment about removing the mods is indicating performance — they don’t hurt anything other than wasting time if you already know the values are mod m.

David said

June 2, 2013 at 9:52 PM

FORTH is tricky since the support for 64 bit math in general is very limited. This is specific to x86 with 80 bit IEEE floating point math. OTOH, 32 bit modular multiplication can be done easily without overflow due to words that generate 64 bit intermediate result.

requires fpmath

: *mod ( n1 n2 n3 -- n )   \ modular multiplication w/o overflow, 32 bits
   >r M* r> fm/mod drop ;

: d/mod  ( d1 d2 -- drem dquot )   \ use FP math; exact for |n| <= 2^63-1
   d>f d>f fswap f2dup f/ floor fdup f>d F* F- f>d 2swap ;

: d/    d/mod 2nip ;
: dmod  d/mod 2drop ;

: 4dup  ( d1 d2 -- d1 d2 d1 d2 )
    2over 2over ;

: d+mod ( d1 d2 d3 -- d )
   2rot 2rot D+ 2swap 4dup D< IF
      2drop
   ELSE
       D-
   THEN ;

: dshift-mod
   >r 2swap r>
   0 DO
      D2*
      4dup 1, d+ d< IF  2over D-  THEN
   LOOP 2nip ;

: d*mod ( d1 d2 d3 -- d )
   locals| m2 m1 b2 b1 a2 a1 |
   a1 b1 UM* m1 m2 dmod
   a1 b2 UM* m1 m2 dmod m1 m2 32 dshift-mod
   a2 b1 UM* m1 m2 dmod m1 m2 32 dshift-mod
   a2 b2 UM* m1 m2 dmod m1 m2 64 dshift-mod
   m1 m2 d+mod
   m1 m2 d+mod
   m1 m2 d+mod ;

3,721,994,338 9,333,744,221,888 100,000 d*mod d. 70144  ok

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Programming Praxis