Assembler, Part 3

April 23, 2014

[ Oops! We've had bad few days at work — a big project for the Department of Education — and I didn't have time to write yesterday's exercise. I set the scheduler to publish it today, on the theory that I would finish the exercise last night, but I came home last night and went to bed, and forgot to push back the schedule. I'll get this fixed later today. Sorry for the confusion. Phil ]

[ Later: It's done now. Sorry for the confusion. Since this is late, there will be no exercise on Friday. ]

We studied a simple assembler for a hypothetical computer in the two previous exercises. Today we finish our look at the assembler by writing a program that produces neat listings, like this one for our sample program:

# print sum of input numbers (terminated by zero)

000:  03010        ld    zero # initialize sum to zero          
001:  04011        st    sum                                    
002:  01000   loop get        # read a number                   
003:  08007        jz    done # no more input if number is zero 
004:  05011        add   sum  # add input to accumulated sum    
005:  04011        st    sum  # store new value back in sum     
006:  09002        j     loop # go back and read another number 

007:  03011   done ld    sum  # print sum                       
008:  02000        put                                          
009:  10000        halt                                         

010:  00000   zero const 0                                      
011:  00000   sum  const

The listing shows the memory address in the first column, the contents of memory at the beginning of the program in the second column, the mnemonic label in the third column, the opcode in the fourth column, the object of the operation in the fifth column, and the comment in the sixth column.

Your task is to write a program that produces assembler listings. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Assembler, Part 2

April 18, 2014

In the previous exercise we built an assembler for a hypothetical computer. Today we write a simulator for that hypothetical computer. The basic idea is simple: Start with a program counter at location 0, either perform the action at the location and advance the program counter to the next location, or set the program counter to the object of a jump instruction.

Your task is to write the simulator. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Assembler, Part 1

April 15, 2014

In this exercise and the next one we will write a simple assembler for a hypothetical computer; we are following the assembler in the book The Awk Programming Langauge by Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger. Our computer has a memory of a thousand five-digit words, a single accumulator, and eleven opcodes:

OPCODE INSTRUCTION DESCRIPTION
00 const C assembler pseudo-operator to define a constant C
01 get read a number from the input to the accumulator
02 put write the number in the accumulator to the output
03 ld M load accumulator with contents of memory location M
04 st M store contents of accumulator in memory location M
05 add M add contents of memory location M to the accumulator
06 sub M subtract contents of memory location M from the accumulator
07 jpos M jump to memory location M if the accumulator is positive
08 jz M jump to memory location M if the accumulator is zero
09 j jump to memory location M, unconditionally
10 halt stop program execution

An assembly-language program is a series of blank lines and statements consisting of up to four fields: The first field, if it exists, is a label; it must start at the first position on the line and may not start with a digit. The second field, which is mandatory, is the opcode; it follows the optional label and mandatory spaces. The third field, which is used only with some opcodes, is the object; if it is present, it follows the opcode and mandatory spaces. The fourth field, which is optional, is a comment; everything on the line following a hash-sign is ignored. Here is a sample assembly-language program that prints the sum of a series of integers entered by the user, with the end of input marked by a 0:

# print sum of input numbers (terminated by zero)

     ld    zero   # initialize sum to zero
     st    sum
loop get          # read a number
     jz    done   # no more input if number is zero
     add   sum    # add input to accumulated sum
     st    sum    # store new value back in sum
     j     loop   # go back and read another number

done ld    sum    # print sum
     put
     halt

zero const 0
sum  const

The contents of memory after loading the sample program are shown on the next page.

Your task is to write a program that assembles a program written in our simple assembly language and loads the program into memory; we’ll write a simulator for our hypothetical computer in the next exercise. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Plotter

April 11, 2014

Programs that create graphs and charts are an important tool in many commercial and technical endeavors; for instance, a programmer might look at a graph of memory usage during a debugging run of his program to identify hot spots that need to be trimmed.

A simple type of graph is a scatter graph, sometimes called an x/y plot. The input is a list of points on an x,y-grid; the output is a picture with the points plotted on a grid. A simple input file appears on the next page.

Your task is to write a program to plot scatter graphs. When you are finished, you are welcome to read or run a suggested solution, or to post your own program or discuss the exercise in the comments below.

Pages: 1 2 3

Formatted Output

April 8, 2014

The only output functions provided by standard Scheme are display, for plain output, and write, for system-formatted output. That’s rather limiting. At the other extreme, Lisp provides the format function, which has more options than you can dream about (before I discarded it in favor of the HyperSpec, the spine of my printed copy of CLtL2 was broken in two places, format and loop). Many languages provide some kind of formatted output — does anyone remember PIC in COBOL? In today’s exercise we will implement the printf function popularized by C and used in several languages.

There are three functions in the printf family: (sprintf fmt expr) returns a string formatted according to the specification given by format and containing the values expr …; (printf fmt expr) displays a string formatted similarly to sprintf to the current output port, and (fprintf port fmt expr) displays a string formatted similarly to sprintf to the indicated port. The fmt and port arguments are always required.

The fmt argument is a string that contains literal text, escape sequences, and specifications of how the expressions should be formatted. A specification consists of a literal percent-sign %, zero or more modifiers, and a single-character specifier. The single-character specifiers that we will support are:

c ascii character
d decimal integer
f floating-point number
o octal integer
s string
x hexadecimal integer
% literal percent sign

There should be as many expressions as there are format specifiers in the fmt string, except that a % literal percent sign specifier does not consume an expression. As many as four modifiers may appear between the literal percent sign % that starts a specifier and the single-character specifier that ends it:

- left-justify the expression in its field; if not given, the expression is right-justified
0 left-pad with leading zeros instead of spaces
width pad field to this width using spaces (or zeros)
.prec digits to right of decimal point, or maximum string length

The modifiers must appear in the order shown above. The width and prec arguments are unsigned decimal integers.

Escape sequences are introduced by a backslash; the following escape sequences are supported:

\b backspace
\f formfeed
\n newline
\r carriage return
\t horizontal tab
\ddd character with octal value ddd, where ddd is 1 to 3 digits between 0 and 7 inclusive
\c any other character c literally, for instance \\ for backslash or \" for quotation mark

Depending on the environment, a newline may (or may not) imply a carriage return, or vice-versa. Several examples are given on the next page.

Your task is to write a function that provides formatted output for your language, using the definition of printf given above or some other specification that is suitable for your language and your aspirations. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

As penance for teasing you on April Fool’s Day, I have another exercise on factoring. Today we will simulate a machine invented by Derrick H Lehmer (the son of the famous father-son-daughter-in-law trio of mathematicians named Lehmer) that uses bicycle chains to factor integers. Unlike the previous exercise, this factoring method is real; in fact, a variant of this method was the best available method to factor integers from the mid-1930s to the mid-1960s, and Lehmer’s machine found many valuable factorizations for the Cunningham Project. We’ll discuss a simplified version of the machine today, as described by Uli Meyer, and the full version of the machine in a future exercise. I am unable to find a copyright-free picture of Lehmer’s machine, but if you’re interested you could visit the Computer History Museum, or see Lehmer himself describe the machine.

The machine is based on Fermat’s method of factoring by a difference of squares: if n = x2y2 then the factors of n are (xy) and (x + y). Fermat’s method starts with y = ⌈ √n ⌉, computes y2n, and stops when the result is a square.

Lehmer’s machine identifies likely squares by looking at quadratic residues. An integer a is a quadratic residue modulo n if there exists some x such that x2a (mod n). For a number y2n to be a square in Fermat’s algorithm, it must be a quadratic residue to many small bases. Lehmer’s machine represents quadratic residues as “open” points on a bicycle chain, then spins many bicycle chains of different lengths along a common axis and stops when all are quadratic residues; in many but not all cases, such a number will will indicate a factor of n.

Uli Meyer gives a better description, shows a useful diagram, and provides photos of his Lego sieve machine.

Your task is to write a program that simulates Lehmer’s machine to factor integers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

A few days ago, an announcement was made in /r/crypto of a new factoring algorithm that runs in time O(log^4 n). A paper describes the algorithm. And even better, coding for the new algorithm is simple:

1)Set mn.
2) Express m as a list of its decimal digits.
3) Express each decimal digit as a list of binary bits.
4) Append the individual lists of binary bits.
5) Convert the resulting list of binary bits into a decimal number. Call it d.
6) If the greatest common divisor of n and d is between 1 and n, it is a factor of n. Report the factor and halt.
7) Set md and go to Step 1.

Steps 1 through 4 form the digital code of a number. For example, 88837 forms the bit-list 1 0 0 0, 1 0 0 0, 1 0 0 0, 1 1, 1 1 1, where we used commas to delineate the decimal digits. That bit-list is 69919 in decimal, which is the digital code of 88837. To find a factor of 88837, compute successive digital codes until a gcd is greater than 1; the chain runs 88837, 69919, 54073, 2847, 2599, 5529, 2921, 333, at which point gcd(88837, 333) = 37 and we have found a factor: 88837 = 74 × 37. Factoring 96577 is even faster: the digital code of 96577 is 40319, and gcd(96577, 40319) = 23, which is a factor of 96577 = 13 × 17 × 19 × 23. But trying to factor 65059 = 17 × 43 × 89 causes an infinite loop. The paper gives a heuristic solution to that problem, but we’ll stop there and accept that sometimes a factorization fails.

Your task is to write a function that factors by digital coding. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Brent’s Root Finder

March 28, 2014

[ Today's exercise was written by guest author Paul Hofstra, who holds a PhD in Theoretical Nuclear Physics from the Free University in Amsterdam and is now retired from a major oil company. Guest authors are always welcome; contact me at the link in the menu bar if you are interested in writing for Programming Praxis. ]

We discussed various algorithms for finding the roots of a function in two previous exercises. In the first of those exercises, we looked at the bisection method, which is “slow and steady” as it always finds a solution, and the secant method, which is sometimes very fast but also sometimes very slow. In the second of those two exercises, we looked at an algorithm of Theodorus Dekker that combines the bisection and secant methods in a way that retains the speed of the secant method where possible but retains the steadiness of the bisection method where necessary. Later, Richard Brent (the same Brent that improved Pollard’s rho algorithm for finding factors), observed that there are still some functions for which Dekker’s algorithm performs poorly, and he developed the method that is most commonly used today.

Brent augmented the linear interpolation of the secant method with a quadratic interpolation that tends to get closer to the root more quickly than the secant method, and he made conditions that force a bisection step at least every three steps, so his method can never be worse than three times the bisection method, and usually much better. Brent calculates a new point using both the secant and bisection methods, as does Dekker, but instead of simply accepting the secant method when it is better, he calculates another potential point using the quadratic method and takes that instead if it is better than the secant point. You can read Brent’s description of his algorithm at http://maths-people.anu.edu.au/~brent/pd/rpb005.pdf.

Your task is to implement Brent’s root finder. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Factoring RSA Moduli

March 24, 2014

We studied RSA encryption in a previous exercise. To recap: select two prime numbers p and q and make their product n = p q the modulus of the crypto system. Choose an encryption key e and a decryption key d such that d e ≡ 1 (mod φ(p q)), where the totient φ(p q) = (p − 1) × (q − 1). Then encrypt a message m as c = me (mod n) and decrypt the cipher test c as m = cd.

In our implementation we chose p and q randomly and kept them secret, even from the user who knows the decryption key. Of course, it is hard for someone who knows who does not know d to factor n and thus compute d; that is the basis of the encryption. But it is fairly easy to recover p and q if you know n, e and d, as Boneh showed (page 205, left column, Fact 1):

  1. [Initialize] Set k d e − 1.
  2. [Try a random g] Choose g at random from {2, …, n − 1} and set tk.
  3. [Next t] If t is divisible by 2, set tt / 2 and xg t (mod n). Otherwise go to Step 2.
  4. [Finished?] If x > 1 and y = gcd(x −1, n) > 1 then set py and q &larrow; n / y, output p and q and terminate the algorithm. Otherwise go to Step 3.

For instance, given n = 25777, e = 3 and d = 16971, we calculate p = 149 and q = 173 when g = 5, noting that 149 × 173 = 25777; on average, it takes no more than two different g to complete the algorithm. Knowledge of p and q, though not necessary, is useful, because a more efficient decryption is possible using p, q and the Chinese remainder theorem, which does arithmetic with numbers the size of p and q instead of the size of their product.

Your task is to write a program that factors RSA moduli. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Root Finding, Again

March 21, 2014

[ Today's exercise was written by guest author Paul Hofstra, who holds a PhD in Theoretical Nuclear Physics from the Free University in Amsterdam and is now retired from a major oil company. Guest authors are always welcome; contact me at the link in the menu bar if you are interested in writing for Programming Praxis. ]

We studied root finding by the bisection and regula falsi methods in a previous exercise. Both methods have linear convergence, meaning that the error is reduced by a constant factor at each iteration; for example, for bisection the error reduces by half at each iteration. Of the two methods, the regula falsi method can be fast some times and slow other times, so the more consistent bisection method is generally preferred. In today’s exercise we look at two other methods, the secant method and Dekker’s method, which both have super-linear convergence.

The secant method is similar to regula falsi, but whereas the regula falsi method holds one endpoint constant, the secant method can move either endpoint, using the last two iterates to calculate the new point. That means the secant method can wander off outside the original interval in its attempt to find a root.

Theodorus Dekker’s 1969 method combines the root bracketing property of the bisection method with the super-linear convergence of the secant method. Dekker uses three points, a and b which bracket the solutions (so f(a) and f(b) have opposite signs) and c which tracks the last position of b. Every iteration does:

  1. If f(b is not the smallest of the three points (in an absolute sense), then swap a and b and make c equal to a.
  2. Calculate the bisections step x = (ab) / 2.
  3. If f(b) is zero or |x| is less than the desired tolerance return b.
  4. Calculate the secant step s, making sure not to divide by zero.
  5. Set c to the current value of b. If s is in the interval [0, x] then set b = b + s, otherwise set b = b + x.
  6. If the new b has the opposite sign as the old b, set a to c.

Step 5 contains the main logic. If the secant step is between b and (ab) / 2, then take the secant step, otherwise take the bisection step. Dekker’s method is simple, very powerful, fast, and works on any continuous function.

Your task is to write programs that implement the secant and Dekker root finders. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Follow

Get every new post delivered to your Inbox.

Join 574 other followers