Permuted Index

December 22, 2009

Rot-file performs the first step, collecting the rotations of each input line:

(define (rot-file filename)
  (define (split s) (string-split #\space s))
  (with-input-from-file filename
    (lambda ()
      (let loop ((line (read-line)) (rs '()))
        (if (eof-object? line) rs
          (loop (read-line)
                (append (rot-line (split line)) rs)))))))

Rot-line produces the rotations of each line, one rotation for each word not on the stop list, stored as a list of pairs with the left half of the sentence (which may be null) in the car and the right half of the sentence (which may not be null) in the cdr. A word is defined as anything between spaces, including letters, digits and punctuation. Rot-line works back-to-front through the list of words:

(define (rot-line words)
  (define (join xs) (string-join #\space xs))
  (let loop ((front (reverse words)) (back '()) (rots '()))
    (if (null? front) rots
      (let ((f (cdr front)) (b (cons (car front) back)))
        (loop f b (if (and (pair? f) (member (car b) stop-list)) rots
                    (cons (cons (join (reverse f)) (join b)) rots)))))))

Our stop list is brief; you may want to extend yours:

(define stop-list
  '("a" "an" "and" "by" "for" "if" "in" "is" "of" "on" "the" "to"))

The third step of Parnas’ algorithm runs through the sorted list of front/back pairs, neatly printing each:

(define (print rots)
  (define (print-line x)
    (display (rjust 32 (car x)))
    (display " ")
    (display (ljust 32 (cdr x)))
    (newline))
  (for-each print-line rots))

Print calls rjust and ljust to align the output:

(define (rjust n str)
  (let ((len (string-length str)))
    (if (< n len)
        (substring str (- len n) len)
        (string-append (make-string (- n len) #\space) str))))

(define (ljust n str)
  (let ((len (string-length str)))
    (if (< n len)
        (substring str 0 n)
        (string-append str (make-string (- n len) #\space)))))

The ptx function puts all the pieces together:

(define (ptx filename)
  (define (order-by a b) (string-ci<? (cdr a) (cdr b)))
  (print (sort order-by (rot-file filename))))

Assuming the three sentences of the example are in a file test.ptx, calling (ptx "test.ptx") produces the permuted index shown on the prior page. The code, including read-line, string-split and string-join from the Standard Prelude, is collected at http://programmingpraxis.codepad.org/ay27GbL9.

About these ads

Pages: 1 2

6 Responses to “Permuted Index”

  1. [...] Praxis – Permuted Index By Remco Niemeijer In today’s Programming Praxis exercise we have to implement David Parnas’ permuted index system. [...]

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/12/22/programming-praxis-permuted-index/ for a version with comments):

    import Data.Char
    import Data.List
    import qualified Data.List.Key as K
    import Text.Printf
    
    stopList :: [String]
    stopList = words "a an and by for if in is of on the to"
    
    rot :: [String] -> [(String, String)]
    rot xs = [(unwords a, unwords b) | (a, b) <- init $
              zip (inits xs) (tails xs), notElem (head b) stopList]
    
    prettyPrint :: [(String, String)] -> IO ()
    prettyPrint xs = mapM_ (\(a, b) -> printf "%*s   %-*s\n" l1 a l2 b) xs
        where l1 = maximum $ map (length . fst) xs
              l2 = maximum $ map (length . snd) xs
    
    permuteIndex :: String -> IO ()
    permuteIndex = prettyPrint . K.sort (\(_, x) -> (map toLower x, x)) .
                   concatMap (rot . words) . lines
    
  3. Geo Marchin said

    My python solution.

    def rotate(phrase):
    “””
    Takes a given phrase and performs a keyword rotation on it, returning a
    list of (part 1, part 2) tuples.
    “””
    # I don’t want to figure out the sorting of capital letters.
    phrase = phrase.lower()
    # split the phrase into words.
    split_phrase = phrase.split(‘ ‘)
    rotation_list = []
    # Perform the actual rotation and return the splits.
    for i in range(len(split_phrase)):
    non_words =\
    ["a","an","and","by","for","if","in","is","of","on","the","to"]
    # If the split is before a non word skip the split, unless it’s the first
    # word in the phrase.
    if (split_phrase[i] in non_words) and (i is not 0):
    continue
    # Create the left and right portions of the split, and insert into the
    # return tuple list.
    left = “”
    right = “”
    for n in range(len(split_phrase[:i])):
    if n == 0:
    left = split_phrase[n]
    else:
    left = left + ‘ ‘ + split_phrase[n]
    for n in range(len(split_phrase[i:])):
    if n == 0:
    right = split_phrase[i+n]
    else:
    right = right + ‘ ‘ + split_phrase[i+n]
    rotation_list.append((left, right))
    return rotation_list

    def sort(rotation_list):
    “””
    Given a rotation list, bubble sort by the right hand side because I’m lazy.
    “””
    for i in range(len(rotation_list)):
    for n in range(len(rotation_list)-1):
    if (rotation_list[n][1] > rotation_list[n+1][1]):
    rotation_list[n], rotation_list[n+1] = \
    rotation_list[n+1], rotation_list[n]

    def unrotate(rotation_list):
    “””
    Neatly output the given rotation list to the user.
    “””
    # Find the length of the longest string in the rotation list.
    max_size = 0
    for i in range(len(rotation_list)):
    for j in range(2):
    if (len(rotation_list[i][j]) > max_size):
    max_size = len(rotation_list[i][j])

    # Output the properly formatted index table.
    for i in range(len(rotation_list)):
    print ‘ ‘ * (max_size – len(rotation_list[i][0])) \
    + rotation_list[i][0] \
    + ” ” \
    + rotation_list[i][1]

  4. Geo Marchin said

    Apologies for failing at code posting.
    Here you go:

  5. I’ve written my OCaml solution (heavily “inspired” by Remco’s) here.

  6. richard mullins said

    Parnas’s paper on kwic index looked like he had spent a week on very intensive design. In my opinion, parnas’s ideas are roughly the same as Niklaus Wirth’s “refinement” method – or at least compatible with it.
    I think Parnas was on the right track with his ideas.

    I was astounded a couple of years ago to see that someone had written a kwic index progam in 3 lines of Python. I did not keep a note of the Author. I still think however, that Parnas’s design method is very good.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 576 other followers

%d bloggers like this: