Montgomery Multiplication

July 29, 2014

We will work with 64-bit unsigned integers in C rather than unlimited-precision integers in Scheme, because the use of unlimited-precision integers makes the exercise trivial. We are following the text and code of Henry Warren’s description of Montgomery multiplication. We assume that long long integers are 64 bits, and select r = 2⁶⁴, which isn’t representable in a 64-bit integer, though that doesn’t matter:

typedef unsigned long long ull; typedef signed long long sll;

We begin with the gcd calculation. We studied the binary gcd algorithm in a previous exercise, but we can simplify and speed that algorithm because we know that r is a power of 2 and m is odd, so there are no powers of 2 to eliminate. Note that r is half what it “should” be, so this function computes rinv and mprime such that (2r) * rinv − m * mprime = 1 rather than the expected r * rinv + m * mprime = 1. Here’s the code:

void gcd(ull r, ull m, ull *r_inv, ull *m_prime) { ull r_saved, m_saved, u, v; r_saved = r; m_saved = m; u = 1; v = 0; while (r > 0) { r = r >> 1; if ((u & 1) == 0) { u = u >> 1; v = v >> 1; } else { u = ((u ^ m_saved) >> 1) + (u & m_saved); v = (v >> 1) + r_saved; } } *r_inv = u; *m_prime = v; return; }

Montgomery multiplication needs the 128-bit product of two 64-bit numbers. We keep the high bits (most significant bits) and the low bits (least significant bits) of the product separate, using the grade-school multiplication algorithm on 32-bit “digits” in an algorithm given at Knuth 4.3.1 M:

void mul_ull(ull x, ull y, ull *xy_hi, ull *xy_lo) { ull x0, x1, y0, y1, t, p0, p1, p2; x1 = x >> 32; x0 = x & 0xFFFFFFFF; y1 = y >> 32; y0 = y & 0xFFFFFFFF;


    t = x0 * y0;

    p0 = t & 0xFFFFFFFF;
    t = x1 * y0 + (t >> 32);

    p1 = t & 0xFFFFFFFF;

    p2 = t >> 32;
    t = x0 * y1 + p1;
    *xy_hi = x1 * y1 + p2 + (t >> 32);

return; }

The modulus operator divides (x || y) by the modulus m, returning the remainder. This function assumes x < m, which is enforced by the mulmod_ull function given below. When the loop finished, y is the quotient and x is the remainder:

ull mod_ull(ull x, ull y, ull m) { sll i, t; for (i = 1; i <= 64; i++) { t = (sll) x >> 63; x = (x << 1) | (y >> 63); y = y << 1; if ((x | t) >= m) { x = x - m; y = y + 1; } } return x; }

We are ready now for the Montgomery step, which takes ā, b̄, m and m′ and returns a b r mod m. The code follows the three-step process described above:

ull mont_mul(ull a_bar, ull b_bar, ull m, ull m_prime) { ull t_hi, t_lo, tm, tmm_hi, tmm_lo, u_hi, u_lo, ov; mul_ull(a_bar, b_bar, &t_hi, &t_lo); tm = t_lo * m_prime; mul_ull(tm, m, &tmm_hi, &tmm_lo); u_lo = t_lo + tmm_lo; u_hi = t_hi + tmm_hi; if (u_lo < t_lo) u_hi = u_hi + 1; ov = (u_hi < t_hi) | ((u_hi == t_hi) & (u_lo < t_lo)); u_lo = u_hi; u_hi = 0; u_lo = u_lo - (m & -(ov | (u_lo >= m))); return u_lo; }

Finally we are ready to specify the modular multiplication, following the four-step algorithm given above; we return 1 if the operation succeeded or 0 if its arguments are out of bounds, and return the product in the fourth argument:

int mulmod_ull(ull a, ull b, ull m, ull *p) { ull half_r = 0x8000000000000000LL; ull r_inv, m_prime, a_bar, b_bar, p_hi, p_lo;


    if ((m & 1) == 0) { return 0; } /* must be odd */

    if (m <= a) { return 0; } /* must be less than m */

    if (m <= b) { return 0; } /* must be less than m */

gcd(half_r, m, &r_inv, &m_prime); a_bar = mod_ull(a, 0, m); b_bar = mod_ull(b, 0, m); *p = mont_mul(a_bar, b_bar, m, m_prime); mul_ull(*p, r_inv, &p_hi, &p_lo); *p = mod_ull(p_hi, p_lo, m); return 1; }

Now the payoff for all our hard work. Given modular multiplication, it is easy to perform modular exponentiation using the square-and-multiply algorithm. The expensive calculations of r_inv and m_prime and the translations to and from Montgomery space are performed only once, and only Montgomery steps are taken inside the loop:

int powmod_ull(ull x, ull y, ull m, ull *x_to_y) { ull half_r = 0x8000000000000000LL; ull r_inv, m_prime, x_bar, hi, lo;


    if ((m & 1) == 0) { return 0; } /* must be odd */

    if (m <= x) { return 0; } /* must be less than m */

gcd(half_r, m, &r_inv, &m_prime); x_bar = mod_ull(x, 0, m); *x_to_y = mod_ull(1, 0, m); while (y > 0) { if ((y & 1) == 1) { *x_to_y = mont_mul(*x_to_y, x_bar, m, m_prime); y = y - 1; } else { x_bar = mont_mul(x_bar, x_bar, m, m_prime); y = y / 2; } } mul_ull(*x_to_y, r_inv, &hi, &lo); *x_to_y = mod_ull(hi, lo, m); return 1; }

Here is an example:

#include <stdio.h/gt;


int main(void) {

    ull z; int t;

    ull x = 34721908534901LL;

    ull y = 72193687003295LL;

    ull m = 9412345678901731LL;
    t = mulmod_ull(x, y, m, &z);

    if (t) { printf("%lld\n", z); }

    else { printf("ERROR"); }

t = powmod_ull(x, y, m, &z); if (t) { printf("%lld\n", z); } else { printf("ERROR"); } return 0; }

If the program is stored in file monty.c, compiling and running the program looks like this; both results shown below are confirmed by Wolfram|Alpha:

> cc -o monty monty.c > chmod +x monty > ./monty 3751384291706939 7001634529421238

You can run the program at http://programmingpraxis.codepad.org/A5ePEDGe.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

2 Responses to “Montgomery Multiplication”

Jussi Piitulainen said
July 29, 2014 at 2:17 PM
Re bar: One theory seems to be to put a code called a combining
diacritic in Unicode after the character. A bar above would be
̄. The following characters looked somewhat usable
locally in w3m and firefox: ā, b̄, c̄,
d̄, ē.

(The above text is copied and pasted from my local test document source code. In the comment editor, at the moment of this writing, I see the hex codes as HTML character entities or whatever they are called.)
Jussi Piitulainen said
July 29, 2014 at 2:24 PM
Hm. The code entity above should have been ampersand hash x 0304 semicolon. The software interpreted it as the actual code. The sample characters look to me as they should.

Programming Praxis