## Permuted Index

### December 22, 2009

`Rot-file` performs the first step, collecting the rotations of each input line:

```(define (rot-file filename)   (define (split s) (string-split #\space s))   (with-input-from-file filename     (lambda ()       (let loop ((line (read-line)) (rs '()))         (if (eof-object? line) rs           (loop (read-line)                 (append (rot-line (split line)) rs)))))))```

`Rot-line` produces the rotations of each line, one rotation for each word not on the stop list, stored as a list of pairs with the left half of the sentence (which may be null) in the `car` and the right half of the sentence (which may not be null) in the `cdr`. A word is defined as anything between spaces, including letters, digits and punctuation. `Rot-line` works back-to-front through the list of words:

```(define (rot-line words)   (define (join xs) (string-join #\space xs))   (let loop ((front (reverse words)) (back '()) (rots '()))     (if (null? front) rots       (let ((f (cdr front)) (b (cons (car front) back)))         (loop f b (if (and (pair? f) (member (car b) stop-list)) rots                     (cons (cons (join (reverse f)) (join b)) rots)))))))```

Our stop list is brief; you may want to extend yours:

```(define stop-list   '("a" "an" "and" "by" "for" "if" "in" "is" "of" "on" "the" "to"))```

The third step of Parnas’ algorithm runs through the sorted list of front/back pairs, neatly printing each:

```(define (print rots)   (define (print-line x)     (display (rjust 32 (car x)))     (display " ")     (display (ljust 32 (cdr x)))     (newline))   (for-each print-line rots))```

`Print` calls `rjust` and `ljust` to align the output:

```(define (rjust n str)   (let ((len (string-length str)))     (if (< n len)         (substring str (- len n) len)         (string-append (make-string (- n len) #\space) str))))```

```(define (ljust n str)   (let ((len (string-length str)))     (if (< n len)         (substring str 0 n)         (string-append str (make-string (- n len) #\space)))))```

The `ptx` function puts all the pieces together:

```(define (ptx filename)   (define (order-by a b) (string-ci<? (cdr a) (cdr b)))   (print (sort order-by (rot-file filename))))```

Assuming the three sentences of the example are in a file `test.ptx`, calling `(ptx "test.ptx")` produces the permuted index shown on the prior page. The code, including `read-line`, `string-split` and `string-join` from the Standard Prelude, is collected at

Pages: 1 2

### 6 Responses to “Permuted Index”

1. […] Praxis – Permuted Index By Remco Niemeijer In today’s Programming Praxis exercise we have to implement David Parnas’ permuted index system. […]

2. Remco Niemeijer said

```import Data.Char
import Data.List
import qualified Data.List.Key as K
import Text.Printf

stopList :: [String]
stopList = words "a an and by for if in is of on the to"

rot :: [String] -> [(String, String)]
rot xs = [(unwords a, unwords b) | (a, b) <- init \$
zip (inits xs) (tails xs), notElem (head b) stopList]

prettyPrint :: [(String, String)] -> IO ()
prettyPrint xs = mapM_ (\(a, b) -> printf "%*s   %-*s\n" l1 a l2 b) xs
where l1 = maximum \$ map (length . fst) xs
l2 = maximum \$ map (length . snd) xs

permuteIndex :: String -> IO ()
permuteIndex = prettyPrint . K.sort (\(_, x) -> (map toLower x, x)) .
concatMap (rot . words) . lines
```
3. Geo Marchin said

My python solution.

def rotate(phrase):
“””
Takes a given phrase and performs a keyword rotation on it, returning a
list of (part 1, part 2) tuples.
“””
# I don’t want to figure out the sorting of capital letters.
phrase = phrase.lower()
# split the phrase into words.
split_phrase = phrase.split(‘ ‘)
rotation_list = []
# Perform the actual rotation and return the splits.
for i in range(len(split_phrase)):
non_words =\
[“a”,”an”,”and”,”by”,”for”,”if”,”in”,”is”,”of”,”on”,”the”,”to”]
# If the split is before a non word skip the split, unless it’s the first
# word in the phrase.
if (split_phrase[i] in non_words) and (i is not 0):
continue
# Create the left and right portions of the split, and insert into the
# return tuple list.
left = “”
right = “”
for n in range(len(split_phrase[:i])):
if n == 0:
left = split_phrase[n]
else:
left = left + ‘ ‘ + split_phrase[n]
for n in range(len(split_phrase[i:])):
if n == 0:
right = split_phrase[i+n]
else:
right = right + ‘ ‘ + split_phrase[i+n]
rotation_list.append((left, right))
return rotation_list

def sort(rotation_list):
“””
Given a rotation list, bubble sort by the right hand side because I’m lazy.
“””
for i in range(len(rotation_list)):
for n in range(len(rotation_list)-1):
if (rotation_list[n] > rotation_list[n+1]):
rotation_list[n], rotation_list[n+1] = \
rotation_list[n+1], rotation_list[n]

def unrotate(rotation_list):
“””
Neatly output the given rotation list to the user.
“””
# Find the length of the longest string in the rotation list.
max_size = 0
for i in range(len(rotation_list)):
for j in range(2):
if (len(rotation_list[i][j]) > max_size):
max_size = len(rotation_list[i][j])

# Output the properly formatted index table.
for i in range(len(rotation_list)):
print ‘ ‘ * (max_size – len(rotation_list[i])) \
+ rotation_list[i] \
+ ” ” \
+ rotation_list[i]

4. Geo Marchin said

Apologies for failing at code posting.
Here you go:

5. I’ve written my OCaml solution (heavily “inspired” by Remco’s) here.

6. richard mullins said

Parnas’s paper on kwic index looked like he had spent a week on very intensive design. In my opinion, parnas’s ideas are roughly the same as Niklaus Wirth’s “refinement” method – or at least compatible with it.
I think Parnas was on the right track with his ideas.

I was astounded a couple of years ago to see that someone had written a kwic index progam in 3 lines of Python. I did not keep a note of the Author. I still think however, that Parnas’s design method is very good.