Permuted Index

December 22, 2009

In 1972, David Parnas proposed the construction of a permuted index, also known as a keyword-in-context or kwic index, as an exercise in program design. For example, the three sentences

    All's well that ends well.
    Nature abhors a vacuum.
    Every man has a price.

produce the permuted index:

                          Nature     abhors a vacuum.                
                                     All's well that ends well.      
                 All's well that     ends well.                      
                                     Every man has a price.          
                       Every man     has a price.                    
                           Every     man has a price.                
                                     Nature abhors a vacuum.         
                 Every man has a     price.                          
                      All's well     that ends well.                 
                 Nature abhors a     vacuum.                         
                           All's     well that ends well.            
            All's well that ends     well.

Parnas proposed a three-step algorithm: rotate, sort, unrotate. The rotate step takes a sentence and produces all its rotations. The sort step sorts the rotations by their back half. The unrotate step produces neatly-formatted output. Somewhere, in one of the three steps, rotations that produce back halves starting with words on a stop list are discarded; note that there is no output for “a vacuum” or “a price.”

Your task is to implement a program that produces permuted indexes using Parnas’ three-step algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

6 Responses to “Permuted Index”

  1. […] Praxis – Permuted Index By Remco Niemeijer In today’s Programming Praxis exercise we have to implement David Parnas’ permuted index system. […]

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/12/22/programming-praxis-permuted-index/ for a version with comments):

    import Data.Char
    import Data.List
    import qualified Data.List.Key as K
    import Text.Printf
    
    stopList :: [String]
    stopList = words "a an and by for if in is of on the to"
    
    rot :: [String] -> [(String, String)]
    rot xs = [(unwords a, unwords b) | (a, b) <- init $
              zip (inits xs) (tails xs), notElem (head b) stopList]
    
    prettyPrint :: [(String, String)] -> IO ()
    prettyPrint xs = mapM_ (\(a, b) -> printf "%*s   %-*s\n" l1 a l2 b) xs
        where l1 = maximum $ map (length . fst) xs
              l2 = maximum $ map (length . snd) xs
    
    permuteIndex :: String -> IO ()
    permuteIndex = prettyPrint . K.sort (\(_, x) -> (map toLower x, x)) .
                   concatMap (rot . words) . lines
    
  3. Geo Marchin said

    My python solution.

    def rotate(phrase):
    “””
    Takes a given phrase and performs a keyword rotation on it, returning a
    list of (part 1, part 2) tuples.
    “””
    # I don’t want to figure out the sorting of capital letters.
    phrase = phrase.lower()
    # split the phrase into words.
    split_phrase = phrase.split(‘ ‘)
    rotation_list = []
    # Perform the actual rotation and return the splits.
    for i in range(len(split_phrase)):
    non_words =\
    [“a”,”an”,”and”,”by”,”for”,”if”,”in”,”is”,”of”,”on”,”the”,”to”]
    # If the split is before a non word skip the split, unless it’s the first
    # word in the phrase.
    if (split_phrase[i] in non_words) and (i is not 0):
    continue
    # Create the left and right portions of the split, and insert into the
    # return tuple list.
    left = “”
    right = “”
    for n in range(len(split_phrase[:i])):
    if n == 0:
    left = split_phrase[n]
    else:
    left = left + ‘ ‘ + split_phrase[n]
    for n in range(len(split_phrase[i:])):
    if n == 0:
    right = split_phrase[i+n]
    else:
    right = right + ‘ ‘ + split_phrase[i+n]
    rotation_list.append((left, right))
    return rotation_list

    def sort(rotation_list):
    “””
    Given a rotation list, bubble sort by the right hand side because I’m lazy.
    “””
    for i in range(len(rotation_list)):
    for n in range(len(rotation_list)-1):
    if (rotation_list[n][1] > rotation_list[n+1][1]):
    rotation_list[n], rotation_list[n+1] = \
    rotation_list[n+1], rotation_list[n]

    def unrotate(rotation_list):
    “””
    Neatly output the given rotation list to the user.
    “””
    # Find the length of the longest string in the rotation list.
    max_size = 0
    for i in range(len(rotation_list)):
    for j in range(2):
    if (len(rotation_list[i][j]) > max_size):
    max_size = len(rotation_list[i][j])

    # Output the properly formatted index table.
    for i in range(len(rotation_list)):
    print ‘ ‘ * (max_size – len(rotation_list[i][0])) \
    + rotation_list[i][0] \
    + ” ” \
    + rotation_list[i][1]

  4. Geo Marchin said

    Apologies for failing at code posting.
    Here you go:

  5. I’ve written my OCaml solution (heavily “inspired” by Remco’s) here.

  6. richard mullins said

    Parnas’s paper on kwic index looked like he had spent a week on very intensive design. In my opinion, parnas’s ideas are roughly the same as Niklaus Wirth’s “refinement” method – or at least compatible with it.
    I think Parnas was on the right track with his ideas.

    I was astounded a couple of years ago to see that someone had written a kwic index progam in 3 lines of Python. I did not keep a note of the Author. I still think however, that Parnas’s design method is very good.

Leave a comment