## Modular Multiplication Without Overflow

### May 28, 2013

We discussed modular arithmetic in a previous exercise. In a comment on that exercise, Matthew asked what you can do when you have a fixed size integer datatype (say 64-bit unsigned integers) and a modulus that is close to the maximum. That’s a good question, which we will answer today. The answer is straight forward if the modulus is at least one bit less than the maximum, and rather trickier if it’s not. We’ll look at multiplication, but the other operations are all similar.

As long as the modulus is at least one bit less than the maximum, the solution is to split the numbers in low-bit and high-bit halves, then perform the arithmetic piecemeal, kind of like grade-school multiplication where you multiply by the one’s digit, then shift the sum and multiply by the ten’s digit, and so on, except that the “digits” are the size of the square root of the maximum number than can be represented in the integer datatype.

If m is within one bit of the maximum for the integer data type, things get messier; the basic answer is to split into three parts, compute the intermediate products, and recombine. But we won’t bother with that.

Your task is to write a function that performs modular multiplication without overflow. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

### 3 Responses to “Modular Multiplication Without Overflow”

1. danaj said
```unsigned long mulmod(unsigned long a, unsigned long b, unsigned long m) {
unsigned long r = 0;
/* Remove these mods if the caller can ensure a and b are in range */
a %= m;
b %= m;
while (b > 0) {
if (b & 1)  r = ((m-r) > a) ? r+a : r+a-m;    /* r = (r + a) % m */
b >>= 1;
if (b)      a = ((m-a) > a) ? a+a : a+a-m;    /* a = (a + a) % m */
}
return r;
}
```

This seems to work for arbitrary 64-bit values, and for me it runs a little faster than splitting into two 32-bit parts. If you have a recent gcc, you can cheat with:

```#define mulmod(a,b,m)  (UV)(((__uint128_t)(a)*(__uint128_t)(b)) % ((__uint128_t)(m)))
```

On x86_64 there is also 4 lines of asm that will do it quickly. There are faster methods if one has lots of operations with the same modulus.

2. danaj said

I see that Mathew’s question was about addition. With unsigned types in C I use:

```#define addmod(a,b,n) ((((n)-(a)) > (b))  ?  ((a)+(b))  :  ((a)+(b)-(n)))
```

to return (a + b) mod n. This requires that a and b are already modulo n, hence n > a,b (if you don’t know, just mod them. but often you know they’re in range). If n-a > b, then there won’t be overflow so we just add; otherwise we do a+b-n. Since this is an unsigned type, it wraps ok around the max integer size, and because a and b were modulo n when we started, one subtract will be enough.

On my previous post, the “UV” in the mulmod macro should be “unsigned long”. The comment about removing the mods is indicating performance — they don’t hurt anything other than wasting time if you already know the values are mod m.

3. David said

FORTH is tricky since the support for 64 bit math in general is very limited. This is specific to x86 with 80 bit IEEE floating point math. OTOH, 32 bit modular multiplication can be done easily without overflow due to words that generate 64 bit intermediate result.

```requires fpmath

: *mod ( n1 n2 n3 -- n )   \ modular multiplication w/o overflow, 32 bits
>r M* r> fm/mod drop ;

: d/mod  ( d1 d2 -- drem dquot )   \ use FP math; exact for |n| <= 2^63-1
d>f d>f fswap f2dup f/ floor fdup f>d F* F- f>d 2swap ;

: d/    d/mod 2nip ;
: dmod  d/mod 2drop ;

: 4dup  ( d1 d2 -- d1 d2 d1 d2 )
2over 2over ;

: d+mod ( d1 d2 d3 -- d )
2rot 2rot D+ 2swap 4dup D< IF
2drop
ELSE
D-
THEN ;

: dshift-mod
>r 2swap r>
0 DO
D2*
4dup 1, d+ d< IF  2over D-  THEN
LOOP 2nip ;

: d*mod ( d1 d2 d3 -- d )
locals| m2 m1 b2 b1 a2 a1 |
a1 b1 UM* m1 m2 dmod
a1 b2 UM* m1 m2 dmod m1 m2 32 dshift-mod
a2 b1 UM* m1 m2 dmod m1 m2 32 dshift-mod
a2 b2 UM* m1 m2 dmod m1 m2 64 dshift-mod
m1 m2 d+mod
m1 m2 d+mod
m1 m2 d+mod ;

3,721,994,338 9,333,744,221,888 100,000 d*mod d. 70144  ok
```