Assembler, Part 1
April 15, 2014
In this exercise and the next one we will write a simple assembler for a hypothetical computer; we are following the assembler in the book The Awk Programming Langauge by Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger. Our computer has a memory of a thousand five-digit words, a single accumulator, and eleven opcodes:
OPCODE | INSTRUCTION | DESCRIPTION |
---|---|---|
00 |
const C |
assembler pseudo-operator to define a constant C |
01 |
get |
read a number from the input to the accumulator |
02 |
put |
write the number in the accumulator to the output |
03 |
ld M |
load accumulator with contents of memory location M |
04 |
st M |
store contents of accumulator in memory location M |
05 |
add M |
add contents of memory location M to the accumulator |
06 |
sub M |
subtract contents of memory location M from the accumulator |
07 |
jpos M |
jump to memory location M if the accumulator is positive |
08 |
jz M |
jump to memory location M if the accumulator is zero |
09 |
j |
jump to memory location M , unconditionally |
10 |
halt |
stop program execution |
An assembly-language program is a series of blank lines and statements consisting of up to four fields: The first field, if it exists, is a label; it must start at the first position on the line and may not start with a digit. The second field, which is mandatory, is the opcode; it follows the optional label and mandatory spaces. The third field, which is used only with some opcodes, is the object; if it is present, it follows the opcode and mandatory spaces. The fourth field, which is optional, is a comment; everything on the line following a hash-sign is ignored. Here is a sample assembly-language program that prints the sum of a series of integers entered by the user, with the end of input marked by a 0:
# print sum of input numbers (terminated by zero) ld zero # initialize sum to zero st sum loop get # read a number jz done # no more input if number is zero add sum # add input to accumulated sum st sum # store new value back in sum j loop # go back and read another number done ld sum # print sum put halt zero const 0 sum const
The contents of memory after loading the sample program are shown on the next page.
Your task is to write a program that assembles a program written in our simple assembly language and loads the program into memory; we’ll write a simulator for our hypothetical computer in the next exercise. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Solution in Racket (https://gist.github.com/adolfopa/11063222#file-asm-1-notests-rkt)
Both one and two pass assemblers in Python.
(def C
{:M (make-array String 1000)
:acc 0
:opc (apply hash-map
(mapcat list
[“const” “get” “put” “ld” “st” “add” “sub” “jpos” “jz” “j” “halt”]
(map #(format “%02d” %) (range))))})
(defn reader
(let [re #”^(\w\S*)?\s+(\w+)\s*(\w*)”
whitespace #”^(#.*)?$”
code-lines (remove #(re-matches whitespace %)
(clojure.string/split code #”\n”))]
(map #(apply vector (rest (re-find re %))) code-lines)))
(defn label-def? [line] (first line))
(defn get-labels
(reduce
(fn [r [[label opc arg] pos]]
(assoc r label pos))
{“” 0}
(filter #(label-def? (first %))
(map list code (range)))))
(defn translate [opcodes source-code]
(let
(reduce
(fn [r [_ opc obj]]
(conj r
(if (= opc “const”) obj
(str (opcodes opc)
(format “%03d” (labels obj))))))
[]
code)))
(defn load-asm [computer code]
(doseq [[pos word] (map list (range) (translate (:opc C) code))]
(aset (:M computer) pos word))
computer)
; invoke with (load-asm C
)
PS: sorry for the previous comment