Assembler, Part 1

April 15, 2014

This exercise is trickier than it looks, since we must write a two-pass assembler to handle labels that appear after their first reference; the first pass reads the program and saves the line numbers referred to by labels, and the second pass writes a memory image containing the assembled program.

We keep the line numbers referred to by labels in a global a-list, and define an a-list of opcodes:

(define labels (list))

(define opcodes '(("const" . 0) ("get" . 1) ("put" . 2)
  ("ld" . 3) ("st" . 4) ("add" . 5) ("sub" . 6) ("jpos" . 7)
  ("jz" . 8) ("j" . 9) ("halt" . 10)))

The first pass reads the program, writes an intermediate form of the program, and collects the a-list of label/line pairs:

(define (asm1 file-name) ; first-pass
  (set! labels (list))
  (with-input-from-file file-name (lambda ()
    (let loop ((k 0) (line (read-line)) (lines (list)))
      (if (eof-object? line) (reverse lines)
        (if (or (string=? line "") (char=? (string-ref line 0) #\#))
           (loop k (read-line) lines)
            (call-with-values
              (lambda () (split (string->list line)))
              (lambda (lbl opc obj cmt)
                (when (not (string=? lbl ""))
                  (set! labels (cons (cons lbl k) labels)))
                (loop (+ k 1) (read-line)
                      (cons (vector k lbl opc obj cmt) lines))))))))))

The output of the first pass is a list of five-slot vectors: line number, label, opcode, object, comment. Everything at this point remains in symbolic form; we keep the comment because we intend to use it in a future exercise. The split function extracts up to four fields from each line:

(define (split line)
  (define (skip-blanks)
    (let loop ()
      (if (or (null? line) (not (char-whitespace? (car line)))) #f
        (begin (set! line (cdr line)) (loop)))))
  (define (get-string)
    (let loop ((str (list)))
      (if (or (null? line) (char-whitespace? (car line)))
          (list->string (reverse str))
          (let ((c (car line)))
            (set! line (cdr line))
            (loop (cons c str))))))
  (let* ((lbl (get-string)) (_ (skip-blanks))
         (opc (get-string)) (_ (skip-blanks))
         (obj (get-string)) (_ (skip-blanks)))
    (values lbl opc obj (list->string line))))

The split function runs through the input line, extracting fields as it sees them and discarding the parts of the line that it has already processed. Our function isn’t perfect — ill-formed input can make it crash — but our exercise is about assembling, not parsing, and our function does work properly on well-formed input. The list output by the first pass is fed to the second pass, which writes a memory image:

(define (asm2 lines) ; second pass
  (let ((mem (make-vector 1000 0)))
    (do ((lines lines (cdr lines))) ((null? lines) mem)
      (let ((num (vector-ref (car lines) 0))
            (opc (vector-ref (car lines) 2))
            (obj (vector-ref (car lines) 3)))
        (vector-set! mem num (+ (* (cdr (assoc opc opcodes)) 1000)
          (if (assoc obj labels) (cdr (assoc obj labels))
            (if (not (string=? obj "")) (string->number obj)
              0))))))))

The last line (which is spread over four physical lines) that performs the vector-set! operation is rather intense. One of three things is stored in the memory location. If obj is a label, memory is set with the numeric value of the opcode in the two high-order digits and the label location in the three low-order digits. If obj is a number, it is necessarily the object of a const opcode, and is written literally to memory; the numeric value of the const opcode is 0, so the two high-order digits are not special, and are just part of the constant. Otherwise, for those opcodes that have no object, the numeric value of the opcode is written to the two high-order digits and the three low-order digits are set to 0.

If our sample program is stored in file program.asm, it is assembled like this:

> (asm2 (asm1 "program.asm"))
#1000(3010 4011 1000 8007 5011 4011 9002 3011 2000 10000 0)

Note that Scheme elides the trailing zeros from the vector.

You can run the program at http://programmingpraxis.codepad.org/3KFpWaoQ. We’ll write a simulator for our hypothetical computer in the next exercise.

Posted by programmingpraxis

Filed in Exercises

4 Comments »

4 Responses to “Assembler, Part 1”

Adolfo said

April 18, 2014 at 8:41 PM

Solution in Racket (https://gist.github.com/adolfopa/11063222#file-asm-1-notests-rkt)

#lang racket/base

(require racket/dict
         racket/match
         racket/sequence
         racket/set
         racket/string)

(define instructions
  (hash "const" 00
        "get"   01
        "put"   02
        "ld"    03
        "st"    04
        "add"   05
        "sub"   06
        "jpos"  07
        "jz"    08
        "j"     09
        "halt"  10))

(define instr-regexp
  (string-join (map regexp-quote
                    (sort (hash-keys instructions) > #:key string-length))
               "|"
               #:before-first "("
               #:after-last ")"))

(define stmt-regexp
  (pregexp
   (string-append "[ \t]*"                 ; leading whitespace
                  "(?:"                    ; (1)
                  "(?:(\\w+)[ \t]+)??"     ; optional label
                  instr-regexp             ; instruction mnemonic
                  "(?:[ \t]+(\\d+|\\w+))?" ; optional argument
                  ")?"                     ; (1)
                  "(?:#.*)?")))            ; optional comment

(struct stmt (label op arg) #:transparent)

(define (read-stmt line)
  (match (regexp-match stmt-regexp line)
    [(list _ fst snd thd)
     (when (and snd (not (dict-ref instructions snd #f)))
       (error "syntax error: unknown mnemonic" line))
     (and (or fst snd thd)
          (stmt fst snd thd))]
    [#f
     (error "syntax error: ill formed statement" line)]))

(define (read-asm in)
  (for*/list ([line (in-lines in)]
              [st (in-value (read-stmt line))]
              #:when st)
    st))

(define (make-symbol-table xs)
  (for/hash ([x xs]
             [n (in-naturals)]
             #:when (stmt-label x))
    (values (stmt-label x) n)))

(define (resolve arg labels)
  (cond [(not arg) 0]
        [(regexp-match #px"^\\d+$" arg)
         (string->number arg)]
        [else
         (dict-ref labels arg)]))

(define (encode op arg labels)
  (+ (* (dict-ref instructions op) 1000)
     (if arg (resolve arg labels) 0)))

(define (assemble prog [mem (make-vector 1000 0)])
  (define stmts (read-asm prog))
  (define labels
    (make-symbol-table stmts))
  (for ([x stmts]
        [i (in-naturals)])
    (match x
      [(stmt _ "const" arg)
       (vector-set! mem i (resolve arg labels))]
      [(stmt _ op arg)
       (vector-set! mem i (encode op arg labels))]))
  mem)

Mike said

April 19, 2014 at 7:10 PM

Both one and two pass assemblers in Python.

import collections
import itertools as it
import re

mnemonic = "const get put ld st add sub jpos jz j halt".split()
opcode = {m:c for c,m in enumerate(mnemonic)}

regex = re.compile(r"""(?x)            # verbose mode regex
    \A                                 # beginning of line
    (?P<label>[a-zA-Z_]\w*)?           # optional label--no leading whitespace
    (?:\s+(?P<opcode>[a-zA-Z]+)        # optional opcode after 1 or more ws
          (?:\s+(?P<operand>\w+))?)?   #   followed by an optional operand
    \s*                                # trailing ws
    (?:\#.*)?                          # optional comment
    \Z                                 # end of line
    """)

EBadArg = "Error: '{label}' is referenced but not defined."
EBadLabel = "Error: Duplicate label '{label}' at line {line_no}."
EBadOp = "Error: Unrecognized instruction mnuemonic: '{op}'."
ETooLong = "Error: Program is too long."
WUnusedLabel = "Warning: Unused label: '{label}'."

def perror(fmt, *args, **kwds):
    print(fmt.format(*args, **kwds))
    
def one_pass(source, listing=False):
    class Symbol:
        def __init__(self):
            self.addr = None
            self.xref = []

    symtab = collections.defaultdict(Symbol)
    memory = [0] * 1000

    warnings = 0
    errors = 0
    
    pc = 0
    
    for line_no, line in enumerate(src):
        mo = regex.match(line)
              
        label, op, arg = mo.groups()
        
        if label:
            if symtab[label].addr is not None:
                perror(EBadLabel, label=label, line_no=line_no)
                errors += 1

            # save label address and fix earlier references
            symtab[label].addr = pc
            for addr in symtab[label].xref:
                memory[addr] += pc
                
        if op:
            instruction = opcode.get(op, None)
            
            if instruction is None:
                perror(EBadOp, op=op)
                errors += 1

            if arg:
                if arg.isdigit():
                    val = int(arg)

                else:
                    symtab[arg].xref.append(pc)
                    val = symtab[arg].addr or 0

            else:
                val = 0
                    
            if pc > 999 and not toolong:
                perror(ETooLong)
                errors += 1
                toolong = True

            if not errors:
                memory[pc] = instruction * 1000 + val

            pc += 1

    for label, info in symtab.items():
        if info.addr is None:
            perror(EBadArg, label=label)
            errors += 1
            
        if info.xref == []:
            perror(WUnusedLabel, label=label)
            warnings += 1

    if errors or warnings:
        print("There were {} errors and {} warnings.".format(errors, warnings))
        return None
    
    else:
        return memory


def two_pass(source, listing=False):
    parsed  = []
    address = {}
    memory  = [0] * 1000
    
    # first pass
    pc = 0
    for line_no, line in enumerate(source):
        label, op, arg = regex.match(line).groups()
        parsed.append((label, op, arg, line_no, line))
        
        if label and label not in address:
            address[label] = pc
                
        if op:
            pc += 1

    # second pass
    pc = 0
    toolong = False
    errors = []
    nerrors = 0

    if listing:
        print("pc   code   line   source\n---  -----  -----  ----------------")
    
    for label, op, arg, line_no, line in parsed:
        if label and address[label] != pc:
            errors.append((EBadLabel, {'label':label, 'line_no':line_no}))
            
        if op:
            instruction = opcode.get(op, None)
            
            if arg:
                val = int(arg) if arg.isdigit() else address.get(arg, None)
            else:
                val = 0
                    
            if instruction is None:
                errors.append((EBadOp, {'op':op}))

            if val is None:
                errors.append((EBadArg, {'label':arg}))
                
            if pc > 999 and not toolong:
                errors.append((ETooLong, {}))
                toolong = True

            if not errors:
                memory[pc] = instruction * 1000 + val

            if listing:
                fmt = "{:03}  {:05}  {:05}  {}"
                print(fmt.format(pc, memory[pc], line_no, line))

            if errors:
                nerrors += len(errors)
                
                if not listing:
                    fmt = "{:05}  {}"
                    print(fmt.format(line_no, line))
                    
                for error, kws in errors:
                    perror(error, **kws)
                errors = []
                print()
                
            pc += 1
            
        else:
            if listing:
                fmt = "            {:05}  {}"
                print(fmt.format(line_no, line))
                    
    return None if nerrors > 0 else memory

Сhristian Мüller (@gravitydenier) said
April 27, 2014 at 1:14 PM
(def C
{:M (make-array String 1000)
:acc 0
:opc (apply hash-map
(mapcat list
[“const” “get” “put” “ld” “st” “add” “sub” “jpos” “jz” “j” “halt”]
(map #(format “%02d” %) (range))))})

(defn reader
(let [re #”^(\w\S*)?\s+(\w+)\s*(\w*)”
whitespace #”^(#.*)?$”
code-lines (remove #(re-matches whitespace %)
(clojure.string/split code #”\n”))]
(map #(apply vector (rest (re-find re %))) code-lines)))

(defn label-def? [line] (first line))

(defn get-labels
(reduce
(fn [r [[label opc arg] pos]]
(assoc r label pos))
{“” 0}
(filter #(label-def? (first %))
(map list code (range)))))

(defn translate [opcodes source-code]
(let

(reduce
(fn [r [_ opc obj]]
(conj r
(if (= opc “const”) obj
(str (opcodes opc)
(format “%03d” (labels obj))))))
[]
code)))

(defn load-asm [computer code]
(doseq [[pos word] (map list (range) (translate (:opc C) code))]
(aset (:M computer) pos word))
computer)

; invoke with (load-asm C )

Сhristian Мüller (@gravitydenier) said

April 27, 2014 at 1:23 PM

(def C
  {:M (make-array String 1000)
   :acc 0
   :opc (apply hash-map 
               (mapcat list 
                       ["const" "get" "put" "ld" "st" "add" "sub" "jpos" "jz" "j" "halt"]
                       (map #(format "%02d" %) (range))))})
(defn reader [code]
  (let [re #"^(\w\S*)?\s+(\w+)\s*(\w*)"
        whitespace #"^(#.*)?$"
        code-lines (remove #(re-matches whitespace %)
                           (clojure.string/split code #"\n"))]
    (map #(apply vector (rest (re-find re %))) code-lines)))

(defn label-def? [line] (first line))

(defn get-labels [code]
  (reduce
    (fn [r [[label opc arg] pos]]
      (assoc r label pos))
    {"" 0}
    (filter #(label-def? (first %))
            (map list code (range)))))

(defn translate [opcodes source-code]
  (let
    [code (reader source-code)
     labels (get-labels code)]
    (reduce
      (fn [r [_ opc obj]]
        (conj r 
              (if (= opc "const") obj
                (str (opcodes opc) 
                     (format "%03d" (labels obj))))))
      []
      code)))

(defn load-asm [computer code]
  (doseq [[pos word] (map list (range) (translate (:opc C) code))]
    (aset (:M computer) pos word))
  computer)

; invoke with (load-asm C <code>)

PS: sorry for the previous comment

Programming Praxis

Assembler, Part 1

April 15, 2014

4 Responses to “Assembler, Part 1”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Assembler, Part 1

April 15, 2014

Share this:

Related

4 Responses to “Assembler, Part 1”

Leave a comment

Categories

Archives

Archives