## Packed Ascii

### May 6, 2014

Packed ascii is a method for compressing a useful subset of ascii characters in 6 bits, used in HART-enabled devices. The characters that may be transmitted are the space character, the 26 upper-case letters, the 10 digits, and the following punctuation characters: `! " # $ % & ' ( ) * + , - . / : ; ? @ [ \ ] ^ _`

. Omitted are the 26 lower-case letters, the rubout character, and the following punctuation characters: `` { | } ~`

.

Compression is achieved by keeping only the six low-order bits of each ascii character. Compressed characters are expanded to their original character by adding a high-order bit that is the complement of the compressed high-order bit. For instance, the string “PRAXIS” is compressed as the six 6-bit binary numbers 010000 010010 000001 011000 001001 010011.

A string of characters is compressed to 6-bit characters, then transmitted as 8-bit bytes by packing the bits from high-order to low-order, so that four characters are transmitted in three bytes, achieving a 25% compression. If the message length is not an even multiple of four, space characters are added to the end of the string as padding. Thus, the six 6-bit binary numbers representing “PRAXIS” would be padded with two space characters and transmitted as the six 8-bit binary numbers 01000001 00100000 01011000 00100101 00111000 00100000, which corresponds to the string “A X%8 “.

Your task is to write functions that compress and expand strings in packed ascii format. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Binary Reflected Gray Code

### May 2, 2014

An *n*-bit Gray sequence is a sequence of 2^{n} values, starting from 0, in which each differs from its predecessor by a single bit. There are always 2^{n−1} *n*-bit Gray sequences; for instance, the two 2-bit Gray sequences are 0, 1, 3, 2 and 0, 2, 3, 1. It is easier to see the Gray sequences when they are written in binary; the two 2-bit Gray sequences written in binary are 00, 01, 11, 10 and 0,0 10, 11, 01, where it is clear that each element of the sequence differs from the previous one by only one bit.

Although there are many possible Gray sequences for any given number of bits, there is one Gray sequence, known as the *binary reflected gray code* BRGC, that is almost always the Gray sequence that is being discussed. Such sequences are generated recursively from the next-smaller sequence by reversing the sequence, prefixing the entries of the original sequence with a 0-bit, prefixing the entries of the reversed sequence with a 1-bit, then concatenating the two sequences. For instance, given the 2-bit Gray sequence 00, 01, 11, 10, its reversal is 10, 11, 01, 00, adding a 0-bit to the original gives 000, 001, 011, 010, adding a 1-bit to the reversal gives 110, 111, 101, 100, and concatenating the two gives 000, 001, 011, 010, 110, 111, 101, 100, which is 0, 1, 3, 2, 6, 7, 5, 4.

Your task is to write a function that generates an *n*-bit binary reflected Gray sequence. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Solovay-Strassen Primality Testing

### April 29, 2014

We have seen the Miller-Rabin and Baillie-Wagstaff primality tests in previous exercises. Today we look at a test developed by Robert M. Solovay and Volker Strassen in 1977. The test is no longer used, having been superseded by the other two tests, but was historically of great importance in the development of the RSA cryptosystem.

The test is based on *Euler’s Criterion*, which states that *a*^{(p−1)/2} ≡ (*a* / *p*) (mod *p*) for any odd prime number *p* and any integer *a* on the range 2 to *p* − 1 with gcd(*a*, *p*) = 1, where (*a* / *p*) is the Legendre symbol. If a number *n* being tested is prime, the test will indicate that *n* is prime; if a number *n* being tested is composite, the test will indicate that it is either prime or composite with equal likelihood, a coin flip. Thus, the Solovay-Strassen test chooses *k* different witnesses *a*; if any of them indicate *n* is composite, then it must be composite, but if none of them indicate *n* is composite, it is presumed prime with odds 2^{−k}.

If you are willing to assume the truth of the Extended Riemann Hypothesis, the Solovay-Strassen test can be made an absolute proof of primality by testing all prime *a* in the range 2 to min(*n *− 1, log^{2} *n*); if none of the *a* indicate the compositeness of *n*, then *n* is prime on the ERH.

The Solovay-Strassen test is no longer used, for three reasons: First, it takes more work to implement, as it involves computing the Jacobi symbol as well as modular exponentiation, whereas the Miller-Rabin test involves only modular exponentiation. Second, the Miller-Rabin test has a better error bound, *k*^{−4} compared to *k*^{−2}. Third, all Euler witnesses are also Miller-Rabin strong witnesses to the compositeness of *n*. The presence of the absolute proof of primality on the ERH is also not a bar to the use of the Miller-Rabin test, as there is a similar proof of primality based on Miller-Rabin strong witnesses.

Your task is to write two versions of the Solovay-Strassen primality test, one probabilistic and one on the ERH. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Assembler, Part 3

### April 23, 2014

We studied a simple assembler for a hypothetical computer in the two previous exercises. Today we finish our look at the assembler by writing a program that produces neat listings, like this one for our sample program:

# print sum of input numbers (terminated by zero) 000: 03010 ld zero # initialize sum to zero 001: 04011 st sum 002: 01000 loop get # read a number 003: 08007 jz done # no more input if number is zero 004: 05011 add sum # add input to accumulated sum 005: 04011 st sum # store new value back in sum 006: 09002 j loop # go back and read another number 007: 03011 done ld sum # print sum 008: 02000 put 009: 10000 halt 010: 00000 zero const 0 011: 00000 sum const

The listing shows the memory address in the first column, the contents of memory at the beginning of the program in the second column, the mnemonic label in the third column, the opcode in the fourth column, the object of the operation in the fifth column, and the comment in the sixth column.

Your task is to write a program that produces assembler listings. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Assembler, Part 2

### April 18, 2014

In the previous exercise we built an assembler for a hypothetical computer. Today we write a simulator for that hypothetical computer. The basic idea is simple: Start with a program counter at location 0, either perform the action at the location and advance the program counter to the next location, or set the program counter to the object of a jump instruction.

Your task is to write the simulator. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Assembler, Part 1

### April 15, 2014

In this exercise and the next one we will write a simple assembler for a hypothetical computer; we are following the assembler in the book *The Awk Programming Langauge* by Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger. Our computer has a memory of a thousand five-digit words, a single accumulator, and eleven opcodes:

OPCODE | INSTRUCTION | DESCRIPTION |
---|---|---|

`00` |
`const C` |
assembler pseudo-operator to define a constant `C` |

`01` |
`get` |
read a number from the input to the accumulator |

`02` |
`put` |
write the number in the accumulator to the output |

`03` |
`ld M` |
load accumulator with contents of memory location `M` |

`04` |
`st M` |
store contents of accumulator in memory location `M` |

`05` |
`add M` |
add contents of memory location `M` to the accumulator |

`06` |
`sub M` |
subtract contents of memory location `M` from the accumulator |

`07` |
`jpos M` |
jump to memory location `M` if the accumulator is positive |

`08` |
`jz M` |
jump to memory location `M` if the accumulator is zero |

`09` |
`j` |
jump to memory location `M` , unconditionally |

`10` |
`halt` |
stop program execution |

An assembly-language program is a series of blank lines and statements consisting of up to four fields: The first field, if it exists, is a label; it must start at the first position on the line and may not start with a digit. The second field, which is mandatory, is the opcode; it follows the optional label and mandatory spaces. The third field, which is used only with some opcodes, is the object; if it is present, it follows the opcode and mandatory spaces. The fourth field, which is optional, is a comment; everything on the line following a hash-sign is ignored. Here is a sample assembly-language program that prints the sum of a series of integers entered by the user, with the end of input marked by a 0:

# print sum of input numbers (terminated by zero) ld zero # initialize sum to zero st sum loop get # read a number jz done # no more input if number is zero add sum # add input to accumulated sum st sum # store new value back in sum j loop # go back and read another number done ld sum # print sum put halt zero const 0 sum const

The contents of memory after loading the sample program are shown on the next page.

Your task is to write a program that assembles a program written in our simple assembly language and loads the program into memory; we’ll write a simulator for our hypothetical computer in the next exercise. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Plotter

### April 11, 2014

Programs that create graphs and charts are an important tool in many commercial and technical endeavors; for instance, a programmer might look at a graph of memory usage during a debugging run of his program to identify hot spots that need to be trimmed.

A simple type of graph is a scatter graph, sometimes called an x/y plot. The input is a list of points on an x,y-grid; the output is a picture with the points plotted on a grid. A simple input file appears on the next page.

Your task is to write a program to plot scatter graphs. When you are finished, you are welcome to read or run a suggested solution, or to post your own program or discuss the exercise in the comments below.

## Formatted Output

### April 8, 2014

The only output functions provided by standard Scheme are `display`

, for plain output, and `write`

, for system-formatted output. That’s rather limiting. At the other extreme, Lisp provides the `format`

function, which has more options than you can dream about (before I discarded it in favor of the HyperSpec, the spine of my printed copy of CLtL2 was broken in two places, `format`

and `loop`

). Many languages provide some kind of formatted output — does anyone remember PIC in COBOL? In today’s exercise we will implement the `printf`

function popularized by C and used in several languages.

There are three functions in the `printf`

family: `(sprintf`

*fmt* *expr* …`)`

returns a string formatted according to the specification given by *format* and containing the values *expr* …; (printf *fmt* *expr* …`)`

displays a string formatted similarly to `sprintf`

to the current output port, and `(fprintf`

*port* *fmt* *expr* …`)`

displays a string formatted similarly to `sprintf`

to the indicated *port*. The *fmt* and *port* arguments are always required.

The *fmt* argument is a string that contains literal text, escape sequences, and specifications of how the *expr*essions should be formatted. A specification consists of a literal percent-sign `%`

, zero or more modifiers, and a single-character specifier. The single-character specifiers that we will support are:

`c`

ascii character

`d`

decimal integer

`f`

floating-point number

`o`

octal integer

`s`

string

`x`

hexadecimal integer

`%`

literal percent sign

There should be as many *expr*essions as there are format specifiers in the *fmt* string, except that a `%`

literal percent sign specifier does not consume an *expr*ession. As many as four modifiers may appear between the literal percent sign `%`

that starts a specifier and the single-character specifier that ends it:

`-`

left-justify theexpression in its field; if not given, theexpression is right-justified

`0`

left-pad with leading zeros instead of spaces

widthpad field to this width using spaces (or zeros)

`.`

precdigits to right of decimal point, or maximum string length

The modifiers must appear in the order shown above. The *width* and *prec* arguments are unsigned decimal integers.

Escape sequences are introduced by a backslash; the following escape sequences are supported:

`\b`

backspace

`\f`

formfeed

`\n`

newline

`\r`

carriage return

`\t`

horizontal tab

`\`

dddcharacter with octal valueddd, wheredddis 1 to 3 digits between 0 and 7 inclusive

`\`

cany other charactercliterally, for instance`\\`

for backslash or`\"`

for quotation mark

Depending on the environment, a newline may (or may not) imply a carriage return, or vice-versa. Several examples are given on the next page.

Your task is to write a function that provides formatted output for your language, using the definition of `printf`

given above or some other specification that is suitable for your language and your aspirations. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Factoring With Bicycle Chains

### April 4, 2014

As penance for teasing you on April Fool’s Day, I have another exercise on factoring. Today we will simulate a machine invented by Derrick H Lehmer (the son of the famous father-son-daughter-in-law trio of mathematicians named Lehmer) that uses bicycle chains to factor integers. Unlike the previous exercise, this factoring method is real; in fact, a variant of this method was the best available method to factor integers from the mid-1930s to the mid-1960s, and Lehmer’s machine found many valuable factorizations for the Cunningham Project. We’ll discuss a simplified version of the machine today, as described by Uli Meyer, and the full version of the machine in a future exercise. I am unable to find a copyright-free picture of Lehmer’s machine, but if you’re interested you could visit the Computer History Museum, or see Lehmer himself describe the machine.

The machine is based on Fermat’s method of factoring by a difference of squares: if *n* = *x*^{2} − *y*^{2} then the factors of *n* are (*x* − *y*) and (*x* + *y*). Fermat’s method starts with *y* = ⌈ √*n* ⌉, computes *y*^{2} − *n*, and stops when the result is a square.

Lehmer’s machine identifies likely squares by looking at quadratic residues. An integer *a* is a quadratic residue modulo *n* if there exists some *x* such that *x*^{2} ≡ *a* (mod *n*). For a number *y*^{2} − *n* to be a square in Fermat’s algorithm, it must be a quadratic residue to many small bases. Lehmer’s machine represents quadratic residues as “open” points on a bicycle chain, then spins many bicycle chains of different lengths along a common axis and stops when all are quadratic residues; in many but not all cases, such a number will will indicate a factor of *n*.

Uli Meyer gives a better description, shows a useful diagram, and provides photos of his Lego sieve machine.

Your task is to write a program that simulates Lehmer’s machine to factor integers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Factoring By Digital Coding

### April 1, 2014

A few days ago, an announcement was made in /r/crypto of a new factoring algorithm that runs in time O(log^4 *n*). A paper describes the algorithm. And even better, coding for the new algorithm is simple:

1)Set

m←n.

2) Expressmas a list of its decimal digits.

3) Express each decimal digit as a list of binary bits.

4) Append the individual lists of binary bits.

5) Convert the resulting list of binary bits into a decimal number. Call itd.

6) If the greatest common divisor ofnanddis between 1 andn, it is a factor ofn. Report the factor and halt.

7) Setm←dand go to Step 1.

Steps 1 through 4 form the *digital code* of a number. For example, 88837 forms the bit-list 1 0 0 0, 1 0 0 0, 1 0 0 0, 1 1, 1 1 1, where we used commas to delineate the decimal digits. That bit-list is 69919 in decimal, which is the digital code of 88837. To find a factor of 88837, compute successive digital codes until a gcd is greater than 1; the chain runs 88837, 69919, 54073, 2847, 2599, 5529, 2921, 333, at which point gcd(88837, 333) = 37 and we have found a factor: 88837 = 7^{4} × 37. Factoring 96577 is even faster: the digital code of 96577 is 40319, and gcd(96577, 40319) = 23, which is a factor of 96577 = 13 × 17 × 19 × 23. But trying to factor 65059 = 17 × 43 × 89 causes an infinite loop. The paper gives a heuristic solution to that problem, but we’ll stop there and accept that sometimes a factorization fails.

Your task is to write a function that factors by digital coding. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.