Xref

April 22, 2011

In the old days, when programs were written on punch cards or paper tape and editors didn’t provide regular-expression searches, most compilers provided a cross-reference utility that showed each use of each identifier in the program. You don’t often see cross-referencers any more, and that’s too bad, because a cross-referencer can provide good clues about the structure of a program; looking at a cross-reference listing can be a good way to start examining an unfamiliar program. Here’s some sample output from a Scheme cross-referencer, with the identifier as the first word on each line followed by a list of line numbers where the identifier appears:

0 28 1 18 < 27 = 30 ?!.+-*/<=>:$%^&_~@ 5 a 24 25 26 27 add1 20 and 26 30 b 24 25 26 27 c 3 4 5 8 9 10 13 caar 30 32 36 37 car 25 26 cdar 30 33 34 36 37 cdr 27 31 34 37 char-alphabetic? 4 char-in-ident? 3 10 char-numeric? 4 char=? 13 cond 9 19 29 cons 11 21 cs 8 9 11 12 14 define 3 7 16 23 24 display 33 36 else 14 21 35 eof-object? 9 19 file 16 17 get-ident 7 18 20 21 identifiers 3 if 9 lambda 17 let 8 11 18 28 line 18 20 21 list->string 9 12 loop 8 11 14 18 20 21 28 31 34 37 lt? 24 28 member 5 newline 13 29 35 not 35 null? 9 29 or 4 25 pair? 12 peek-char 8 11 14 prev-line 28 30 31 prev-word 28 30 31 32 34 35 read-char 11 13 14 reverse 9 12 scheme 3 sort 28 string->list 5 string<? 25 string=? 20 26 30 32 35 w 18 19 20 21 when 35 with-input-from-file 17 ws 18 19 20 21 23 28 29 30 31 32 33 34 36 37 x 11 xref 1 16 xref-out 19 23

A quick look suggests that this program reads a text file (with-input-from-file, read-char, eof-object?) and does some kind of processing on its characters (char-alphabetic?, char-numeric?). There is some list-handling (car, cdr, string->list), loop is used conventionally, and ws seems to be some kind of key to understanding what the program does. And there is an unholy mess on line 5.

Your task is to write a cross-referencer for your chosen language. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

10 Comments »

10 Responses to “Xref”

arturasl said
April 22, 2011 at 12:40 PM
Solution in C++: github
Outputs any alphanumeric word which does not start with digit and which is neither part of comment nor literal string
Veer said
April 22, 2011 at 12:40 PM
You can see my code at http://codepad.org/XXLZhua1

Remco Niemeijer said

April 22, 2011 at 2:55 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2011/04/22/programming-praxis-xref/ for a version with comments):

import Data.List
import qualified Data.List.Key as K
import Language.Haskell.Lexer

main :: IO ()
main = do file <- readFile "test.hs"
          mapM_ putStrLn . map ((\((n:_), ls) -> unwords $ n : nub ls) .
              unzip) . K.group fst $ K.sort fst
              [(s, show $ line p) | (tok, (p,s)) <- lexerPass0 file, 
               elem tok [Varid, Conid, Varsym, Consym]]

Graham said

April 22, 2011 at 3:49 PM

For lack of a good knowledge of Python lexical parsers:

#!/usr/bin/env python

from collections import defaultdict
from re import compile

def xref(file_name):
    exclude = compile(r"[\\\"'\(\)\[\]:?.,]")       # unwanted characters
    comment = compile(r"\#.*")                      # comments
    with open(file_name) as f:
        lines = f.readlines()
        d = defaultdict(list)
        for line_num in xrange(len(lines)):
            line = exclude.sub(' ', lines[line_num])
            line = comment.sub('', line).strip()
            for word in line.split():
                d[word].append(line_num + 1)
    return d

def output(x_d):
    for entry in sorted(str(k) + "\t" + ", ".join(str(i) for i in v) for (k, v)
            in x_d.iteritems()):
        print entry
    return None

if __name__ == "__main__":
    import sys
    output(xref(sys.argv[1] if sys.argv[1:] else "xref.py"))

Veer said
April 22, 2011 at 4:04 PM
@pp : I don’t think 0 can be considered identifier in scheme.
Your first line output 0 as identifier.
programmingpraxis said
April 22, 2011 at 5:27 PM
Veer: I don’t know if 0 is officially a Scheme identifier, or not, but it doesn’t matter. The cross-referencer is more useful if it includes 0 in the list, and that’s good enough for me.
slabounty said
April 23, 2011 at 2:38 AM
Here’s a quick one in ruby …
```
xref = Hash.new
linecount = 0
IO.foreach(ARGV[0]) do |line|
    linecount += 1
    line.sub!(/#.*/, "")
    tokens = line.split(/[^A-Za-z0-9]/)
    tokens.each do |token|
        (xref[token] ||= []) << linecount if token != ""
    end
end

xref.each_pair do |t, l|
    puts "#{t}: #{l}"
end
```
Someone mentioned taking out the literal strings, but my feeling was that they should be left in as part of the cross reference. For splitting, I just split on anything that wasn’t a letter, number, or underscore. I think that should get most everything correct. There are probably some edge cases though (floating point numbers come to mind).

If anyone’s wondering, you can’t do Hash.new([]) to get an empty array as the default value of the new hash element. That will create a single array that’s shared by each hash element. Hence the funny looking line 8 which a) creates a new empty array if the hash element is empty and then b) adds the token to it (or the existing array).
David said
April 25, 2011 at 6:50 PM
Here is one in Factor. A bit more complicated since any non-space printing character can be used in identifiers. Therefore it is necessary to filter out anything that is not an identifier and whatever is left must be an identifier. Lots of edge cases here, but I chose to filter out string constants, integers, end of line comments, stack comments and words ending with “:” (e.g. CONSTANT:, :, USE:, etc.), and sequence constructors ({ }, H{ }, etc.)
```
! very simple xref

USING: kernel math math.order regexp ascii io.encodings.ascii formatting sorting
       io io.files assocs sequences strings ;
IN: xref

CONSTANT: xref-dict H{ }

CONSTANT: content-filter  R/ \s!.*$|^!.*$||\b[0-9]+\b|\b"[^"]*"\b|\b[HV]?{\b|[][};]|\b.*:\b|$.*--.*$\b/

: filter-content ( str -- str )
   content-filter "" re-replace ;

: addref  ( symbol line -- )
    over xref-dict at
      [ rot drop ]
      [ V{ } clone [ rot xref-dict set-at ] keep ] if*
    swap suffix! drop ;

: gobble-input  ( -- )
    1
    [
      filter-content  R/ \s+/ re-split
      [ >string dup length 0 >
        [ over addref ] [ drop ] if ] each
      1 +
    ] each-line drop ;
 
: report ( -- )
    xref-dict keys [ <=> ] sort
    [ dup xref-dict at 
      swap "%s " printf [ "%d " printf ] each nl ] each ;

: xref ( file -- )
    xref-dict clear-assoc
    ascii <file-reader> [ gobble-input ] with-input-stream
    report ;
```
( scratchpad ) “xref.factor” xref
+ 26
30
36
> 24
>string 24
R/ 23
\b|\\b/ 9
\s+/ 23
addref 14 25
ascii 3 36
assocs 4
at 15 31
clear-assoc 35
clone 17
content-filter 12
drop 16 18 25 27
dup 24 31
each 25 32 32
each-line 27
filter-content 11 23
formatting 3
gobble-input 20 36
if 25
if* 17
io 4
io.encodings.ascii 3
io.files 4
keep 17
kernel 3
keys 30
length 24
math 3
math.order 3
nl 32
over 15 25
printf 32 32
re-replace 12
re-split 23
regexp 3
report 29 37
rot 16 17
sequences 4
set-at 17
sort 30
sorting 3
strings 4
suffix! 18
swap 18 32
with-input-stream 36
xref 5 34
xref-dict 7 15 17 30 31 35
David said
April 26, 2011 at 3:59 AM
Ooh,

The output above looks wrong, but is correct, problem is that first three lines was interpreted as HTML. Should be:

+ 26
<=> 30
<file-reader> 36
…

Rainer said

April 26, 2011 at 6:55 AM

My try in REXX, this script analyzes itself

sym = ''
asym. = ''
nbr_zeilen = sourceline()
znr = 0
do znr = 1 to nbr_zeilen
    zeile = sourceline(znr)
    zeile = translate(zeile,'  ','()')
    do while length(zeile) > 0
        parse value zeile with first zeile
        if pos(first, sym) == 0 then sym = sym first
        asym.first = asym.first znr 
    end
end

sym = bubble(sym)
do while length(sym) > 0
    parse value sym with first sym
    say first asym.first
end
exit

bubble: procedure
    parse arg text
    t. = ''
    i = 0
    do while length(text) > 0
        parse value text with first text
        i = i + 1
        t.i = first
    end
    chg = 1
    do while chg == 1
        chg = 0
        do j = 1 to i-1
            do k = j+1 to i
                if t.j > t.k then do
                    tmp = t.j
                    t.j = t.k
                    t.k = tmp
                    chg = 1
                end
            end
        end
    end
    text = ''
    do j = 1 to i
        text = text t.j
    end
    return strip(text)

Programming Praxis

Xref

April 22, 2011

10 Responses to “Xref”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Xref

April 22, 2011

Share this:

Related

10 Responses to “Xref”

Leave a comment

Categories

Archives

Archives