Permuted Index
December 22, 2009
Rot-file
performs the first step, collecting the rotations of each input line:
(define (rot-file filename)
(define (split s) (string-split #\space s))
(with-input-from-file filename
(lambda ()
(let loop ((line (read-line)) (rs '()))
(if (eof-object? line) rs
(loop (read-line)
(append (rot-line (split line)) rs)))))))
Rot-line
produces the rotations of each line, one rotation for each word not on the stop list, stored as a list of pairs with the left half of the sentence (which may be null) in the car
and the right half of the sentence (which may not be null) in the cdr
. A word is defined as anything between spaces, including letters, digits and punctuation. Rot-line
works back-to-front through the list of words:
(define (rot-line words)
(define (join xs) (string-join #\space xs))
(let loop ((front (reverse words)) (back '()) (rots '()))
(if (null? front) rots
(let ((f (cdr front)) (b (cons (car front) back)))
(loop f b (if (and (pair? f) (member (car b) stop-list)) rots
(cons (cons (join (reverse f)) (join b)) rots)))))))
Our stop list is brief; you may want to extend yours:
(define stop-list
'("a" "an" "and" "by" "for" "if" "in" "is" "of" "on" "the" "to"))
The third step of Parnas’ algorithm runs through the sorted list of front/back pairs, neatly printing each:
(define (print rots)
(define (print-line x)
(display (rjust 32 (car x)))
(display " ")
(display (ljust 32 (cdr x)))
(newline))
(for-each print-line rots))
Print
calls rjust
and ljust
to align the output:
(define (rjust n str)
(let ((len (string-length str)))
(if (< n len)
(substring str (- len n) len)
(string-append (make-string (- n len) #\space) str))))
(define (ljust n str)
(let ((len (string-length str)))
(if (< n len)
(substring str 0 n)
(string-append str (make-string (- n len) #\space)))))
The ptx
function puts all the pieces together:
(define (ptx filename)
(define (order-by a b) (string-ci<? (cdr a) (cdr b)))
(print (sort order-by (rot-file filename))))
Assuming the three sentences of the example are in a file test.ptx
, calling (ptx "test.ptx")
produces the permuted index shown on the prior page. The code, including read-line
, string-split
and string-join
from the Standard Prelude, is collected at
[…] Praxis – Permuted Index By Remco Niemeijer In today’s Programming Praxis exercise we have to implement David Parnas’ permuted index system. […]
My Haskell solution (see http://bonsaicode.wordpress.com/2009/12/22/programming-praxis-permuted-index/ for a version with comments):
My python solution.
def rotate(phrase):
“””
Takes a given phrase and performs a keyword rotation on it, returning a
list of (part 1, part 2) tuples.
“””
# I don’t want to figure out the sorting of capital letters.
phrase = phrase.lower()
# split the phrase into words.
split_phrase = phrase.split(‘ ‘)
rotation_list = []
# Perform the actual rotation and return the splits.
for i in range(len(split_phrase)):
non_words =\
[“a”,”an”,”and”,”by”,”for”,”if”,”in”,”is”,”of”,”on”,”the”,”to”]
# If the split is before a non word skip the split, unless it’s the first
# word in the phrase.
if (split_phrase[i] in non_words) and (i is not 0):
continue
# Create the left and right portions of the split, and insert into the
# return tuple list.
left = “”
right = “”
for n in range(len(split_phrase[:i])):
if n == 0:
left = split_phrase[n]
else:
left = left + ‘ ‘ + split_phrase[n]
for n in range(len(split_phrase[i:])):
if n == 0:
right = split_phrase[i+n]
else:
right = right + ‘ ‘ + split_phrase[i+n]
rotation_list.append((left, right))
return rotation_list
def sort(rotation_list):
“””
Given a rotation list, bubble sort by the right hand side because I’m lazy.
“””
for i in range(len(rotation_list)):
for n in range(len(rotation_list)-1):
if (rotation_list[n][1] > rotation_list[n+1][1]):
rotation_list[n], rotation_list[n+1] = \
rotation_list[n+1], rotation_list[n]
def unrotate(rotation_list):
“””
Neatly output the given rotation list to the user.
“””
# Find the length of the longest string in the rotation list.
max_size = 0
for i in range(len(rotation_list)):
for j in range(2):
if (len(rotation_list[i][j]) > max_size):
max_size = len(rotation_list[i][j])
# Output the properly formatted index table.
for i in range(len(rotation_list)):
print ‘ ‘ * (max_size – len(rotation_list[i][0])) \
+ rotation_list[i][0] \
+ ” ” \
+ rotation_list[i][1]
Apologies for failing at code posting.
Here you go:
I’ve written my OCaml solution (heavily “inspired” by Remco’s) here.
Parnas’s paper on kwic index looked like he had spent a week on very intensive design. In my opinion, parnas’s ideas are roughly the same as Niklaus Wirth’s “refinement” method – or at least compatible with it.
I think Parnas was on the right track with his ideas.
I was astounded a couple of years ago to see that someone had written a kwic index progam in 3 lines of Python. I did not keep a note of the Author. I still think however, that Parnas’s design method is very good.