Permuted Index

December 22, 2009

Rot-file performs the first step, collecting the rotations of each input line:

(define (rot-file filename) (define (split s) (string-split #\space s)) (with-input-from-file filename (lambda () (let loop ((line (read-line)) (rs '())) (if (eof-object? line) rs (loop (read-line) (append (rot-line (split line)) rs)))))))

Rot-line produces the rotations of each line, one rotation for each word not on the stop list, stored as a list of pairs with the left half of the sentence (which may be null) in the car and the right half of the sentence (which may not be null) in the cdr. A word is defined as anything between spaces, including letters, digits and punctuation. Rot-line works back-to-front through the list of words:

(define (rot-line words) (define (join xs) (string-join #\space xs)) (let loop ((front (reverse words)) (back '()) (rots '())) (if (null? front) rots (let ((f (cdr front)) (b (cons (car front) back))) (loop f b (if (and (pair? f) (member (car b) stop-list)) rots (cons (cons (join (reverse f)) (join b)) rots)))))))

Our stop list is brief; you may want to extend yours:

(define stop-list '("a" "an" "and" "by" "for" "if" "in" "is" "of" "on" "the" "to"))

The third step of Parnas’ algorithm runs through the sorted list of front/back pairs, neatly printing each:

(define (print rots) (define (print-line x) (display (rjust 32 (car x))) (display " ") (display (ljust 32 (cdr x))) (newline)) (for-each print-line rots))

Print calls rjust and ljust to align the output:

(define (rjust n str) (let ((len (string-length str))) (if (< n len) (substring str (- len n) len) (string-append (make-string (- n len) #\space) str))))

(define (ljust n str) (let ((len (string-length str))) (if (< n len) (substring str 0 n) (string-append str (make-string (- n len) #\space)))))

The ptx function puts all the pieces together:

(define (ptx filename) (define (order-by a b) (string-ci<? (cdr a) (cdr b))) (print (sort order-by (rot-file filename))))

Assuming the three sentences of the example are in a file test.ptx, calling (ptx "test.ptx") produces the permuted index shown on the prior page. The code, including read-line, string-split and string-join from the Standard Prelude, is collected at

Posted by programmingpraxis

Filed in Exercises

6 Comments »

6 Responses to “Permuted Index”

Programming Praxis – Permuted Index « Bonsai Code said
December 22, 2009 at 12:30 PM
[…] Praxis – Permuted Index By Remco Niemeijer In today’s Programming Praxis exercise we have to implement David Parnas’ permuted index system. […]

Remco Niemeijer said

December 22, 2009 at 12:31 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2009/12/22/programming-praxis-permuted-index/ for a version with comments):

import Data.Char
import Data.List
import qualified Data.List.Key as K
import Text.Printf

stopList :: [String]
stopList = words "a an and by for if in is of on the to"

rot :: [String] -> [(String, String)]
rot xs = [(unwords a, unwords b) | (a, b) <- init $
          zip (inits xs) (tails xs), notElem (head b) stopList]

prettyPrint :: [(String, String)] -> IO ()
prettyPrint xs = mapM_ (\(a, b) -> printf "%*s   %-*s\n" l1 a l2 b) xs
    where l1 = maximum $ map (length . fst) xs
          l2 = maximum $ map (length . snd) xs

permuteIndex :: String -> IO ()
permuteIndex = prettyPrint . K.sort (\(_, x) -> (map toLower x, x)) .
               concatMap (rot . words) . lines

Geo Marchin said
December 23, 2009 at 10:11 PM
My python solution.

def rotate(phrase):
“””
Takes a given phrase and performs a keyword rotation on it, returning a
list of (part 1, part 2) tuples.
“””
# I don’t want to figure out the sorting of capital letters.
phrase = phrase.lower()
# split the phrase into words.
split_phrase = phrase.split(‘ ‘)
rotation_list = []
# Perform the actual rotation and return the splits.
for i in range(len(split_phrase)):
non_words =\
[“a”,”an”,”and”,”by”,”for”,”if”,”in”,”is”,”of”,”on”,”the”,”to”]
# If the split is before a non word skip the split, unless it’s the first
# word in the phrase.
if (split_phrase[i] in non_words) and (i is not 0):
continue
# Create the left and right portions of the split, and insert into the
# return tuple list.
left = “”
right = “”
for n in range(len(split_phrase[:i])):
if n == 0:
left = split_phrase[n]
else:
left = left + ‘ ‘ + split_phrase[n]
for n in range(len(split_phrase[i:])):
if n == 0:
right = split_phrase[i+n]
else:
right = right + ‘ ‘ + split_phrase[i+n]
rotation_list.append((left, right))
return rotation_list

def sort(rotation_list):
“””
Given a rotation list, bubble sort by the right hand side because I’m lazy.
“””
for i in range(len(rotation_list)):
for n in range(len(rotation_list)-1):
if (rotation_list[n][1] > rotation_list[n+1][1]):
rotation_list[n], rotation_list[n+1] = \
rotation_list[n+1], rotation_list[n]

def unrotate(rotation_list):
“””
Neatly output the given rotation list to the user.
“””
# Find the length of the longest string in the rotation list.
max_size = 0
for i in range(len(rotation_list)):
for j in range(2):
if (len(rotation_list[i][j]) > max_size):
max_size = len(rotation_list[i][j])

# Output the properly formatted index table.
for i in range(len(rotation_list)):
print ‘ ‘ * (max_size – len(rotation_list[i][0])) \
+ rotation_list[i][0] \
+ ” ” \
+ rotation_list[i][1]
Geo Marchin said
December 23, 2009 at 10:40 PM
Apologies for failing at code posting.
Here you go:
Matías Giovannini said
December 23, 2009 at 11:38 PM
I’ve written my OCaml solution (heavily “inspired” by Remco’s) here.
richard mullins said
April 24, 2012 at 10:56 PM
Parnas’s paper on kwic index looked like he had spent a week on very intensive design. In my opinion, parnas’s ideas are roughly the same as Niklaus Wirth’s “refinement” method – or at least compatible with it.
I think Parnas was on the right track with his ideas.

I was astounded a couple of years ago to see that someone had written a kwic index progam in 3 lines of Python. I did not keep a note of the Author. I still think however, that Parnas’s design method is very good.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Programming Praxis

Permuted Index

December 22, 2009

6 Responses to “Permuted Index”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Permuted Index

December 22, 2009

Share this:

Related

6 Responses to “Permuted Index”

Leave a comment

Categories

Archives

Archives