## Modular Multiplication Without Overflow

### May 28, 2013

Let’s begin with an example, calculating 56 * 37 modulo 100 using 8-bit arithmetic, so no intermediate total may be 256 or greater. We start by representing a = 56 = 3 * 16 + 8 and b = 37 = 2 * 16 + 5, so:

```a1 = 8 a2 = 3 b1 = 5 b2 = 2```

Then the four intermediate products with their shifts are:

```p11 = 8 * 5 = 40 p12 = 8 * 2 = 16 > 32 > 64 > 128 (28) > 56 p21 = 3 * 5 = 15 > 30 > 60 > 120 (20) > 40 p22 = 3 * 2 = 6 > 12 > 24 > 48 > 96 > 192 (92) > 184 (84) > 168 (68) > 136 (36)```

We’re using binary arithmetic, so each number doubles as it is shifted, taking it modulo 100 as we go. The product of two low-half numbers is not shifted, the product of a low-half and high-half number is shifted 4 times (since log2 16 = 4), and the product of two high-half numbers is shifted 8 times. Then the intermediate products are summed, again removing m each time an intermediate sum exceeds m:

```s = 40 + 56 = 96 s = 96 + 40 = 136 (36) s = 36 + 36 = 72```

And that’s the final answer: 56 * 37 = 2072, which is 72 (mod 100).

Here’s the code:

```(define (mod-mul a b m) ; assumes a,b < 2^64 and m < 2^63   (define (shift x k)     (let loop ((k k) (x x))       (if (zero? k) x         (let* ((x (* x 2))                (x (if (< x m) x (- x m))))           (loop (- k 1) x)))))   (let* ((two32 (expt 2 32))          (a1 (remainder a two32))          (a2 (quotient a two32))          (b1 (remainder b two32))          (b2 (quotient b two32))          (p11 (modulo (* a1 b1) m))          (p12 (shift (modulo (* a1 b2) m) 32))          (p21 (shift (modulo (* a2 b1) m) 32))          (p22 (shift (modulo (* a2 b2) m) 64))          (s (+ p11 p12))          (s (if (< s m) s (- s m)))          (s (+ s p21))          (s (if (< s m) s (- s m)))          (s (+ s p22))          (s (if (< s m) s (- s m))))     s))```

And here’s a somewhat larger example:

```> (define a 17259738289493410580109721600) > (define b 10327523882682224844906430464) > (define m 8631375519702822467321987072) > (mod-mul a b m) 7869126927168251407166865408```

You can run the program at http://programmingpraxis.codepad.org/psbpa4MC.

Pages: 1 2

### 3 Responses to “Modular Multiplication Without Overflow”

1. danaj said
```unsigned long mulmod(unsigned long a, unsigned long b, unsigned long m) {
unsigned long r = 0;
/* Remove these mods if the caller can ensure a and b are in range */
a %= m;
b %= m;
while (b > 0) {
if (b & 1)  r = ((m-r) > a) ? r+a : r+a-m;    /* r = (r + a) % m */
b >>= 1;
if (b)      a = ((m-a) > a) ? a+a : a+a-m;    /* a = (a + a) % m */
}
return r;
}
```

This seems to work for arbitrary 64-bit values, and for me it runs a little faster than splitting into two 32-bit parts. If you have a recent gcc, you can cheat with:

```#define mulmod(a,b,m)  (UV)(((__uint128_t)(a)*(__uint128_t)(b)) % ((__uint128_t)(m)))
```

On x86_64 there is also 4 lines of asm that will do it quickly. There are faster methods if one has lots of operations with the same modulus.

2. danaj said

I see that Mathew’s question was about addition. With unsigned types in C I use:

```#define addmod(a,b,n) ((((n)-(a)) > (b))  ?  ((a)+(b))  :  ((a)+(b)-(n)))
```

to return (a + b) mod n. This requires that a and b are already modulo n, hence n > a,b (if you don’t know, just mod them. but often you know they’re in range). If n-a > b, then there won’t be overflow so we just add; otherwise we do a+b-n. Since this is an unsigned type, it wraps ok around the max integer size, and because a and b were modulo n when we started, one subtract will be enough.

On my previous post, the “UV” in the mulmod macro should be “unsigned long”. The comment about removing the mods is indicating performance — they don’t hurt anything other than wasting time if you already know the values are mod m.

3. David said

FORTH is tricky since the support for 64 bit math in general is very limited. This is specific to x86 with 80 bit IEEE floating point math. OTOH, 32 bit modular multiplication can be done easily without overflow due to words that generate 64 bit intermediate result.

```requires fpmath

: *mod ( n1 n2 n3 -- n )   \ modular multiplication w/o overflow, 32 bits
>r M* r> fm/mod drop ;

: d/mod  ( d1 d2 -- drem dquot )   \ use FP math; exact for |n| <= 2^63-1
d>f d>f fswap f2dup f/ floor fdup f>d F* F- f>d 2swap ;

: d/    d/mod 2nip ;
: dmod  d/mod 2drop ;

: 4dup  ( d1 d2 -- d1 d2 d1 d2 )
2over 2over ;

: d+mod ( d1 d2 d3 -- d )
2rot 2rot D+ 2swap 4dup D< IF
2drop
ELSE
D-
THEN ;

: dshift-mod
>r 2swap r>
0 DO
D2*
4dup 1, d+ d< IF  2over D-  THEN
LOOP 2nip ;

: d*mod ( d1 d2 d3 -- d )
locals| m2 m1 b2 b1 a2 a1 |
a1 b1 UM* m1 m2 dmod
a1 b2 UM* m1 m2 dmod m1 m2 32 dshift-mod
a2 b1 UM* m1 m2 dmod m1 m2 32 dshift-mod
a2 b2 UM* m1 m2 dmod m1 m2 64 dshift-mod
m1 m2 d+mod
m1 m2 d+mod
m1 m2 d+mod ;

3,721,994,338 9,333,744,221,888 100,000 d*mod d. 70144  ok
```