Assembler, Part 1
April 15, 2014
This exercise is trickier than it looks, since we must write a two-pass assembler to handle labels that appear after their first reference; the first pass reads the program and saves the line numbers referred to by labels, and the second pass writes a memory image containing the assembled program.
We keep the line numbers referred to by labels in a global a-list, and define an a-list of opcodes:
(define labels (list)) (define opcodes '(("const" . 0) ("get" . 1) ("put" . 2) ("ld" . 3) ("st" . 4) ("add" . 5) ("sub" . 6) ("jpos" . 7) ("jz" . 8) ("j" . 9) ("halt" . 10)))
The first pass reads the program, writes an intermediate form of the program, and collects the a-list of label/line pairs:
(define (asm1 file-name) ; first-pass (set! labels (list)) (with-input-from-file file-name (lambda () (let loop ((k 0) (line (read-line)) (lines (list))) (if (eof-object? line) (reverse lines) (if (or (string=? line "") (char=? (string-ref line 0) #\#)) (loop k (read-line) lines) (call-with-values (lambda () (split (string->list line))) (lambda (lbl opc obj cmt) (when (not (string=? lbl "")) (set! labels (cons (cons lbl k) labels))) (loop (+ k 1) (read-line) (cons (vector k lbl opc obj cmt) lines))))))))))
The output of the first pass is a list of five-slot vectors: line number, label, opcode, object, comment. Everything at this point remains in symbolic form; we keep the comment because we intend to use it in a future exercise. The split
function extracts up to four fields from each line:
(define (split line) (define (skip-blanks) (let loop () (if (or (null? line) (not (char-whitespace? (car line)))) #f (begin (set! line (cdr line)) (loop))))) (define (get-string) (let loop ((str (list))) (if (or (null? line) (char-whitespace? (car line))) (list->string (reverse str)) (let ((c (car line))) (set! line (cdr line)) (loop (cons c str)))))) (let* ((lbl (get-string)) (_ (skip-blanks)) (opc (get-string)) (_ (skip-blanks)) (obj (get-string)) (_ (skip-blanks))) (values lbl opc obj (list->string line))))
The split
function runs through the input line
, extracting fields as it sees them and discarding the parts of the line that it has already processed. Our function isn’t perfect — ill-formed input can make it crash — but our exercise is about assembling, not parsing, and our function does work properly on well-formed input. The list output by the first pass is fed to the second pass, which writes a memory image:
(define (asm2 lines) ; second pass (let ((mem (make-vector 1000 0))) (do ((lines lines (cdr lines))) ((null? lines) mem) (let ((num (vector-ref (car lines) 0)) (opc (vector-ref (car lines) 2)) (obj (vector-ref (car lines) 3))) (vector-set! mem num (+ (* (cdr (assoc opc opcodes)) 1000) (if (assoc obj labels) (cdr (assoc obj labels)) (if (not (string=? obj "")) (string->number obj) 0))))))))
The last line (which is spread over four physical lines) that performs the vector-set!
operation is rather intense. One of three things is stored in the memory location. If obj
is a label, memory is set with the numeric value of the opcode in the two high-order digits and the label location in the three low-order digits. If obj
is a number, it is necessarily the object of a const
opcode, and is written literally to memory; the numeric value of the const
opcode is 0, so the two high-order digits are not special, and are just part of the constant. Otherwise, for those opcodes that have no object, the numeric value of the opcode is written to the two high-order digits and the three low-order digits are set to 0.
If our sample program is stored in file program.asm
, it is assembled like this:
> (asm2 (asm1 "program.asm")) #1000(3010 4011 1000 8007 5011 4011 9002 3011 2000 10000 0)
Note that Scheme elides the trailing zeros from the vector.
You can run the program at http://programmingpraxis.codepad.org/3KFpWaoQ. We’ll write a simulator for our hypothetical computer in the next exercise.
Solution in Racket (https://gist.github.com/adolfopa/11063222#file-asm-1-notests-rkt)
Both one and two pass assemblers in Python.
(def C
{:M (make-array String 1000)
:acc 0
:opc (apply hash-map
(mapcat list
[“const” “get” “put” “ld” “st” “add” “sub” “jpos” “jz” “j” “halt”]
(map #(format “%02d” %) (range))))})
(defn reader
(let [re #”^(\w\S*)?\s+(\w+)\s*(\w*)”
whitespace #”^(#.*)?$”
code-lines (remove #(re-matches whitespace %)
(clojure.string/split code #”\n”))]
(map #(apply vector (rest (re-find re %))) code-lines)))
(defn label-def? [line] (first line))
(defn get-labels
(reduce
(fn [r [[label opc arg] pos]]
(assoc r label pos))
{“” 0}
(filter #(label-def? (first %))
(map list code (range)))))
(defn translate [opcodes source-code]
(let
(reduce
(fn [r [_ opc obj]]
(conj r
(if (= opc “const”) obj
(str (opcodes opc)
(format “%03d” (labels obj))))))
[]
code)))
(defn load-asm [computer code]
(doseq [[pos word] (map list (range) (translate (:opc C) code))]
(aset (:M computer) pos word))
computer)
; invoke with (load-asm C
)
PS: sorry for the previous comment