Permuted Index

December 22, 2009

In 1972, David Parnas proposed the construction of a permuted index, also known as a keyword-in-context or kwic index, as an exercise in program design. For example, the three sentences

All's well that ends well. Nature abhors a vacuum. Every man has a price.

produce the permuted index:

                          Nature     abhors a vacuum.                
                                     All's well that ends well.      
                 All's well that     ends well.                      
                                     Every man has a price.          
                       Every man     has a price.                    
                           Every     man has a price.                
                                     Nature abhors a vacuum.         
                 Every man has a     price.                          
                      All's well     that ends well.                 
                 Nature abhors a     vacuum.                         
                           All's     well that ends well.            
            All's well that ends     well.

Parnas proposed a three-step algorithm: rotate, sort, unrotate. The rotate step takes a sentence and produces all its rotations. The sort step sorts the rotations by their back half. The unrotate step produces neatly-formatted output. Somewhere, in one of the three steps, rotations that produce back halves starting with words on a stop list are discarded; note that there is no output for “a vacuum” or “a price.”

Your task is to implement a program that produces permuted indexes using Parnas’ three-step algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

6 Comments »

6 Responses to “Permuted Index”

Programming Praxis – Permuted Index « Bonsai Code said
December 22, 2009 at 12:30 PM
[…] Praxis – Permuted Index By Remco Niemeijer In today’s Programming Praxis exercise we have to implement David Parnas’ permuted index system. […]

Remco Niemeijer said

December 22, 2009 at 12:31 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2009/12/22/programming-praxis-permuted-index/ for a version with comments):

import Data.Char
import Data.List
import qualified Data.List.Key as K
import Text.Printf

stopList :: [String]
stopList = words "a an and by for if in is of on the to"

rot :: [String] -> [(String, String)]
rot xs = [(unwords a, unwords b) | (a, b) <- init $
          zip (inits xs) (tails xs), notElem (head b) stopList]

prettyPrint :: [(String, String)] -> IO ()
prettyPrint xs = mapM_ (\(a, b) -> printf "%*s   %-*s\n" l1 a l2 b) xs
    where l1 = maximum $ map (length . fst) xs
          l2 = maximum $ map (length . snd) xs

permuteIndex :: String -> IO ()
permuteIndex = prettyPrint . K.sort (\(_, x) -> (map toLower x, x)) .
               concatMap (rot . words) . lines

Geo Marchin said
December 23, 2009 at 10:11 PM
My python solution.

def rotate(phrase):
“””
Takes a given phrase and performs a keyword rotation on it, returning a
list of (part 1, part 2) tuples.
“””
# I don’t want to figure out the sorting of capital letters.
phrase = phrase.lower()
# split the phrase into words.
split_phrase = phrase.split(‘ ‘)
rotation_list = []
# Perform the actual rotation and return the splits.
for i in range(len(split_phrase)):
non_words =\
[“a”,”an”,”and”,”by”,”for”,”if”,”in”,”is”,”of”,”on”,”the”,”to”]
# If the split is before a non word skip the split, unless it’s the first
# word in the phrase.
if (split_phrase[i] in non_words) and (i is not 0):
continue
# Create the left and right portions of the split, and insert into the
# return tuple list.
left = “”
right = “”
for n in range(len(split_phrase[:i])):
if n == 0:
left = split_phrase[n]
else:
left = left + ‘ ‘ + split_phrase[n]
for n in range(len(split_phrase[i:])):
if n == 0:
right = split_phrase[i+n]
else:
right = right + ‘ ‘ + split_phrase[i+n]
rotation_list.append((left, right))
return rotation_list

def sort(rotation_list):
“””
Given a rotation list, bubble sort by the right hand side because I’m lazy.
“””
for i in range(len(rotation_list)):
for n in range(len(rotation_list)-1):
if (rotation_list[n][1] > rotation_list[n+1][1]):
rotation_list[n], rotation_list[n+1] = \
rotation_list[n+1], rotation_list[n]

def unrotate(rotation_list):
“””
Neatly output the given rotation list to the user.
“””
# Find the length of the longest string in the rotation list.
max_size = 0
for i in range(len(rotation_list)):
for j in range(2):
if (len(rotation_list[i][j]) > max_size):
max_size = len(rotation_list[i][j])

# Output the properly formatted index table.
for i in range(len(rotation_list)):
print ‘ ‘ * (max_size – len(rotation_list[i][0])) \
+ rotation_list[i][0] \
+ ” ” \
+ rotation_list[i][1]
Geo Marchin said
December 23, 2009 at 10:40 PM
Apologies for failing at code posting.
Here you go:
Matías Giovannini said
December 23, 2009 at 11:38 PM
I’ve written my OCaml solution (heavily “inspired” by Remco’s) here.
richard mullins said
April 24, 2012 at 10:56 PM
Parnas’s paper on kwic index looked like he had spent a week on very intensive design. In my opinion, parnas’s ideas are roughly the same as Niklaus Wirth’s “refinement” method – or at least compatible with it.
I think Parnas was on the right track with his ideas.

I was astounded a couple of years ago to see that someone had written a kwic index progam in 3 lines of Python. I did not keep a note of the Author. I still think however, that Parnas’s design method is very good.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Programming Praxis

Permuted Index

December 22, 2009

6 Responses to “Permuted Index”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Permuted Index

December 22, 2009

Share this:

Related

6 Responses to “Permuted Index”

Leave a comment

Categories

Archives

Archives