Programming Praxis


Home | Pages | Archives


Run Length Encoding

February 26, 2010 9:00 AM

The compress function is tricky only because the output is delayed at least one character from the input. That means we need an initialization pass through the main loop to “prime the pump” and start the recursion. Put-run is a separate function because it is called in two places:

(define (compress in-port out-port)
  (define (n->char n) (integer->char (+ 64 n)))
  (define (show-run prev n)
    (display #\~ out-port)
    (display (n->char n) out-port)
    (display prev out-port))
  (define (put-run prev n)
    (cond ((char=? prev #\~) (show-run #\~ n))
          ((< n 4) (display (make-string n prev) out-port))
          ((< n 27) (show-run prev n))
          (else (show-run prev 26) (put-run prev (- n 26)))))
  (let loop ((c (read-char in-port)) (prev #f) (n 0))
    (cond ((eof-object? c) (if prev (put-run prev n)))
          ((and prev (char=? c prev))
            (loop (read-char in-port) prev (+ n 1)))
          (prev (put-run prev n) (loop (read-char in-port) c 1))
          (else (loop (read-char in-port) c 1)))))

Expansion is simpler than compression; just read the input, handle tildes specially, and output everything else:

(define (expand in-port out-port)
  (define (char->n c) (- (char->integer c) 64))
  (let loop ((c (read-char in-port)))
    (unless (eof-object? c)
      (if (char=? c #\~)
          (let* ((n (char->n (read-char in-port)))
                 (c (read-char in-port)))
            (display (make-string n c) out-port))
          (display c out-port))
      (loop (read-char in-port)))))

Here is an example:

> (with-input-from-string "ABBB~CDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
    (lambda () (compress (current-input-port) (current-output-port))))
ABBB~A~C~ED~ZE~DE
> (with-input-from-string "ABBB~A~C~ED~ZE~DE"
    (lambda () (expand (current-input-port) (current-output-port))))
ABBB~CDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

You can see the program at http://programmingpraxis.codepad.org/ZQoT0BnW.

Posted by programmingpraxis

Categories: Exercises

Tags:

4 Responses to “Run Length Encoding”

  1. […] Praxis – Run Length Encoding By Remco Niemeijer In today’s Programming Praxis exercise we have to implement a run length encoding algorithm. The provided […]

    By Programming Praxis – Run Length Encoding « Bonsai Code on February 26, 2010 at 9:43 AM

  2. My Haskell solution (see http://bonsaicode.wordpress.com/2010/02/26/programming-praxis-run-length-encoding/ for a version with comments):

    import Data.List
    import Data.List.Split
    
    compress :: String -> String
    compress s = f =<< chunk 26 =<< group s where
        f xs = if length xs < 4 && take 1 xs /= "~" then xs
               else '~' : toEnum (length xs + 64) : take 1 xs
    
    expand :: String -> String
    expand []           = []
    expand ('~':r:c:xs) = replicate (fromEnum r - 64) c ++ expand xs
    expand (c:xs)       = c : expand xs
    

    By Remco Niemeijer on February 26, 2010 at 9:44 AM

  3. My take on compress. I interpreted “any literal appearance of the tilde in the input is encoded as a run of length 1” as being a special case, so the output is different from Niemeijer’s version when there are several tildes in a row.

    import Control.Arrow
    import Data.List
    
    compress :: String -> String
    compress xs = group xs >>= (head &&& length >>> encode)
        where encode (c, n) | c == '~'  = concat . replicate n $ "~A~"
                            | n < 4     = replicate n c
                            | n > 26    = encode (c, 26) ++ encode (c, n-26)
                            | otherwise = '~' : toEnum (fromEnum 'A' + n - 1 ) : [c]
    

    Full code with a quickcheck test: http://codepad.org/VakQUriW

    By domor on February 26, 2010 at 9:43 PM

  4. Python, done using regular expressions.

    Like domor, I separately encoded each ~ in the input.

    from string import ascii_uppercase
    import re
    
    def compress( s ):
        def sub( mo ):
            run = mo.group()
            return "~" + ascii_uppercase[len(run)-1] + run[0]
        
        return re.sub( r"(.)\1{3,25}|~", sub, s )
    
    def expand( s ):
        def sub( mo ):
            code = mo.group()
            return code[2]*(ascii_uppercase.index(code[1]) +1)
        
        return re.sub(r"(~..)", sub, s)
    
    
    def test():
        s = 'ABBB~CDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE'
        t = 'ABBB~A~C~ED~ZE~DE'
        assert compress( s ) == t
        assert expand( t ) == s
    
    
    

    By Mike on April 21, 2010 at 11:00 PM

Leave a Reply



Mobile Site | Full Site


Get a free blog at WordPress.com Theme: WordPress Mobile Edition by Alex King.