Just Showing Off

May 19, 2017

Here’s the email that convinced my correspondent that there might be something interesting in Scheme and Scheme macros:

Let’s consider a simple program to calculate the nth fibonacci number. Mathematically, the definition is that the nth fibonacci number is the sum of the n-1th fibonacci number and the n-2th fibonacci number, with the first two fibonacci numbers being 1 and 1. That translates directly to Scheme like this:

(define (fib n)
  (if (< n 2) 1
    (+ (fib (- n 1)) (fib (- n 2)))))

Timing the calculation shows that it quickly grows slow as n grows large, which makes sense because the algorithm takes exponential time to repeatedly recalculate all the smaller fibonacci numbers:

> (time (fib 40))
(time (fib 40))
    no collections
    15069 ms elapsed cpu time
    15064 ms elapsed real time
    0 bytes allocated
165580141

Fifteen seconds is a long time to calculate a small number.

It is easy to write a Scheme macro that memoizes, or caches, the results of the subproblems inherent in the fibonacci calculation. Here’s the macro:

(define-syntax define-memoized
  (syntax-rules ()
    ((define-memoized (f arg ...) body ...)
      (define f
        (let ((cache (list)))
          (lambda (arg ...)
            (cond ((assoc `(,arg ...) cache) => cdr)
            (else (let ((val (begin body ...)))
                    (set! cache (cons (cons `(,arg ...) val) cache))
                    val)))))))))

We’ll explain that in a moment. But first let’s look at how to write the fibonacci function using that macro:

(define-memoized (fib n)
  (if (< n 2) 1
    (+ (fib (- n 1)) (fib (- n 2)))))

That is, of course, exactly the same as the earlier fibonacci function, except that we used define-memoized instead of simple define to write the function. But look at what a difference the memoization makes:

> (time (fib 40))
(time (fib 40))
    no collections
    0 ms elapsed cpu time
    0 ms elapsed real time
    5456 bytes allocated
165580141

We’ve gone from fifteen seconds to zero without doing any work, which is astonishing! Even calculating a number like (fib 4000) doesn’t cause any trauma:

> (time (fib 4000))
(time (fib 4000))
    no collections
    141 ms elapsed cpu time
    144 ms elapsed real time
    1364296 bytes allocated
64574884490948173531376949015369595644413900640151342708407577598177210359034088
91444947780728724174376074152378381889749922700974218315248201906276355079874370
42751068564702163075936230573885067767672020696704775060888952943005092911660239
47866841763853953813982281703936665369922709095308006821399524780721049955829191
40702994362208777929645917401261014865952038117045259114133194933608057714170864
57836066360819419152173551158109939739457834939838445927496726613615480616157565
95818944317619922097369917676974058206341892088144549337974422952140132621568340
70101627342272782776272615306630309305298205175744474242803310752241946621965578
04131017595052316172225782924860810023912187851892996757577669202694023487336446
62725774717740924068828300186439425921761082545463164628807702653752619616157324
434040342057336683279284098590801501

How does it work? The high-level explanation is that the macro modifies fib to store internally, in cache, the result of previous function calls with the same parameters, and return them directly instead of recalculating them. Thus, when (fib 40) requires the result of (fib 39) and (fib 38), the results are already available and don’t need to be recomputed. The cache data structure is known in Scheme parlance as an a-list (an association list), meaning it is a linked list of key/value pairs where the key is n and the value is (fib n). Function assoc looks up the key in the cache, `(,arg ...) is a quasi-quotation that expands the arguments to the function (fib takes only one argument, but the macro admits functions that take more than one). The => symbol is syntax that passes the result of a cond predicate to its consequent, and cdr is a function that extracts the value from the key/value pair. The else clause of the cond calculates a never-before-seen value in the let expression, then updates the cache with set! and returns the newly-computed value.

You can run the program at http://ideone.com/MGxOUa.

I’ve been using define-memoized for years. I suppose now that R6RS has standardized hash tables I ought to rewrite it, because hash tables ought to be faster than a-lists, but I’ve never bothered; it’s so easy to write just the way it is, and every time I use the new hash table functions I have to look up how to use them again, and why fix something that isn’t broken. I love Scheme!

Advertisements

Pages: 1 2

10 Responses to “Just Showing Off”

  1. Steve said

    Can this also be done without a macro?

  2. programmingpraxis said

    @Steve: You could write a memoizing fib function without using a macro, by writing the memoization yourself inside the function, but you could not write define-memoized without using a macro.

  3. Thanks.

    MUMPS

    When I first started programming, dealing with files could be a bit complicated. Then I started programming in MUMPS, and it became a non-issue, because in MUMPS everything is stored in a sparse, multi-dimensional array.

    
    S ^PATIENTS(12,1)="SMITH, HARRY^M^12/22/1957^123-45-6789"   ; Store Demographics (12 is the internal id)
    S ^PATIENTS(12,2)="123 45th Street^New York^New York^01234" ; Store Address
    
    S ^PINDEX("SMITH, HARRY",12)=""                             ; Name index
    
    S INDEX=$O(^PINDEX("SMITH, HARRY","")) Q:'INDEX                     ; Quit if no index for "SMITH, HARRY"
    
    S TXT=^PATIENTS(INDEX,1)
    W !,"NAME: ",$P(TXT,"^",1)
    S SEX=$P(TXT,"^",2)
    W !,"SEX:  ",$S(SEX="M":"Male",SEX="F":"Female",1:"Unknown")
    W !,"DOB:  ",$P(TXT,"^",3)
    W !,"SSN:  ",$P(TXT,"^",4)
    ;
    S TXT=^PATIENTS(INDEX,2)
    W !,"ADDR: ",$P(TXT,"^",1)
    W !,"CITY: ",$P(TXT,"^",2)
    W !,"ST:   ",$P(TXT,"^",3)
    W !,"ZIP:  ",$P(TXT,"^",4)
     
    

    While I liked the way they handled storage, another feature was the ability to handle multiple tasks and multiple users on small systems. I heard of a fellow who had a side business he ran on a PC-AT (6 or 8 MHz 8086, < 3 MB RAM). It was for accounting and supported multiple simultaneous users. The only other language I have heard that had such a capability is Forth: I have actually heard of 1 Forth system which handled 50 tasks with a Z80 8-bit CPU.

  4. Tamayo said

    Ah, hi.

    The D programming language is, more or less, a simplified and regularized dialect of C++, and it may have had (through Andrei Alexandrescu, a D enthusiast and C++ committee member) significant influence on more modern versions of C++. In particular, the modern C++ “constexpr” keyword and compile-time execution of functions seem directly influenced by D.

    To Schemers, metacircularity and hygienic macros are old hat. To those of us brought up in the C tradition, they are new and exciting. D provides the first by including a D interpreter in the D compiler, so that deterministic D code whose input is provably immutable can be executed and its result may be used as data in the compilation of the rest of the program. Just as a Scheme macro is only a Scheme function whose input is a list representing the abstract syntax tree of a program, so indeed are D functions executable by the compiler during the compilation. The following small program, when compiled with all optimizations disabled

    import std.stdio;

    private uint fib(in uint n) pure nothrow @nogc
    {
        if (n == 0) {
            return 1;
        }

        return n * fib(n - 1);
    }

    private immutable uint f5 = fib(5); // 120, to be precise

    int main()
    {
        stdout.writeln(f5);
        return 0;
    }

    reduces to the equivalent program

    import std.stdio;

    int main()
    {
        stdout.writeln(120);
        return 0;
    }

    Hygienic macros are not as easy to give small examples for, but a beautiful large example is found in D’s own standard library: in particular, the std.regex module, wherein typical messy regular expressions are compiled to D code and inlined thereafter using the facility above.

    Yes, again, Scheme has done this for decades. It’s new and exciting for us plebeians though.

  5. Tamayo said

    (Argh! I called the function above “fib” instead of “fac”! Please forgive me!)

  6. In Python 3.2 they added the @lru_cache decorator (least recently used cache) which does function memoization (which until today I always read as memoRization) and allows the user to set a max size for the cache for long running processes. The docs use the Fibonacci sequence as their example. These ease of writing your own decorators is often an example I use as a non standard reason Python is great.

  7. Paul said

    What I like about Python is the large number of libraries. An example (after Raymond Hettinger): a script that walks down a directory tree and lists all duplicate files (with the same content). I think Python is great, but there are, of course, many other great languages.

    import os, hashlib, pprint, codecs, collections
    
    hashmap = collections.defaultdict(list)
    
    for path, dirs, files in os.walk('.'):
        for filename in files:
            fullname = os.path.join(path, filename)
            with open(fullname, errors='ignore') as f:
                d = codecs.encode(f.read(), 'utf-8', errors='ignore')
            h = hashlib.md5(d).hexdigest()
            hashmap[h].append(fullname)
    
    pprint.pprint([v for k, v in hashmap.items() if len(v) > 1])
    
  8. Rutger said

    Personally I just love the concept of list comprehension. It facilitates a concise way of filtering and executing operations on the list members. Quite some programming languages support it. I mostly use it in Python now, had some fun with it in XQuery a few years back as well.

    string = "Here is 1 example 2 demonstrate list comprehensions 4 you in 3 examples."
    numbers = [int(s) for s in string if s.isdigit()]
    even = sorted([n for n in numbers if not n % 2 ])
    exponentiation = [(n, [n**i for i in range(5)]) for n in even]
    
    for number, powers in exponentiation:
    	print(number, ':', powers)
    

    Specific to Python: if you change every defined list with [] to () in above example, everything becomes generators and gets computed lazily, i.e. the powers of 2 are computed first before the next number is even determined from the input string.

  9. programmingpraxis said

    @Rutger: I like list comprehensions, too. That’s why I provide them in the Standard Prelude. Likewise pattern matching.

  10. john said

    Using the Wolfram Language (also know as “Mathematica”), one can easily memoize functions. It’s even easier than the method shown above!

    Let’s use the Fibonacci function as an example. A naive implementation, fib, would look like:


    fib[0] := 1;
    fib[1] := 1;
    fib[n_] := fib[n - 1] + fib[n - 2];

    Calculating the 30th Fibonacci number takes about 3 seconds on my machine. To memoize the function, we simply replace fib[n_] := ... with fib[n_] := fib[n] = .... This gives us


    fib[0] := 1;
    fib[1] := 1;
    fib[n_] := fib[n] = fib[n - 1] + fib[n - 2];

    Now computing the 30th Fibonacci number takes about 0.1 seconds on my machine (30x speedup). I like this idiom because of how easy it is to incorporate memoization into your code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: