Just Showing Off
May 19, 2017
Here’s the email that convinced my correspondent that there might be something interesting in Scheme and Scheme macros:
Let’s consider a simple program to calculate the nth fibonacci number. Mathematically, the definition is that the nth fibonacci number is the sum of the n-1th fibonacci number and the n-2th fibonacci number, with the first two fibonacci numbers being 1 and 1. That translates directly to Scheme like this:
(define (fib n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))Timing the calculation shows that it quickly grows slow as n grows large, which makes sense because the algorithm takes exponential time to repeatedly recalculate all the smaller fibonacci numbers:
> (time (fib 40)) (time (fib 40)) no collections 15069 ms elapsed cpu time 15064 ms elapsed real time 0 bytes allocated 165580141Fifteen seconds is a long time to calculate a small number.
It is easy to write a Scheme macro that memoizes, or caches, the results of the subproblems inherent in the fibonacci calculation. Here’s the macro:
(define-syntax define-memoized (syntax-rules () ((define-memoized (f arg ...) body ...) (define f (let ((cache (list))) (lambda (arg ...) (cond ((assoc `(,arg ...) cache) => cdr) (else (let ((val (begin body ...))) (set! cache (cons (cons `(,arg ...) val) cache)) val)))))))))We’ll explain that in a moment. But first let’s look at how to write the fibonacci function using that macro:
(define-memoized (fib n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))That is, of course, exactly the same as the earlier fibonacci function, except that we used
define-memoizedinstead of simpledefineto write the function. But look at what a difference the memoization makes:> (time (fib 40)) (time (fib 40)) no collections 0 ms elapsed cpu time 0 ms elapsed real time 5456 bytes allocated 165580141We’ve gone from fifteen seconds to zero without doing any work, which is astonishing! Even calculating a number like
(fib 4000)doesn’t cause any trauma:> (time (fib 4000)) (time (fib 4000)) no collections 141 ms elapsed cpu time 144 ms elapsed real time 1364296 bytes allocated 64574884490948173531376949015369595644413900640151342708407577598177210359034088 91444947780728724174376074152378381889749922700974218315248201906276355079874370 42751068564702163075936230573885067767672020696704775060888952943005092911660239 47866841763853953813982281703936665369922709095308006821399524780721049955829191 40702994362208777929645917401261014865952038117045259114133194933608057714170864 57836066360819419152173551158109939739457834939838445927496726613615480616157565 95818944317619922097369917676974058206341892088144549337974422952140132621568340 70101627342272782776272615306630309305298205175744474242803310752241946621965578 04131017595052316172225782924860810023912187851892996757577669202694023487336446 62725774717740924068828300186439425921761082545463164628807702653752619616157324 434040342057336683279284098590801501How does it work? The high-level explanation is that the macro modifies
fibto store internally, incache, the result of previous function calls with the same parameters, and return them directly instead of recalculating them. Thus, when(fib 40)requires the result of(fib 39)and(fib 38), the results are already available and don’t need to be recomputed. Thecachedata structure is known in Scheme parlance as an a-list (an association list), meaning it is a linked list of key/value pairs where the key is n and the value is(fib n). Functionassoclooks up the key in the cache,`(,arg ...)is a quasi-quotation that expands the arguments to the function (fibtakes only one argument, but the macro admits functions that take more than one). The=>symbol is syntax that passes the result of acondpredicate to its consequent, andcdris a function that extracts the value from the key/value pair. Theelseclause of thecondcalculates a never-before-seen value in theletexpression, then updates the cache withset!and returns the newly-computed value.
You can run the program at http://ideone.com/MGxOUa.
I’ve been using define-memoized for years. I suppose now that R6RS has standardized hash tables I ought to rewrite it, because hash tables ought to be faster than a-lists, but I’ve never bothered; it’s so easy to write just the way it is, and every time I use the new hash table functions I have to look up how to use them again, and why fix something that isn’t broken. I love Scheme!
Can this also be done without a macro?
@Steve: You could write a memoizing
fibfunction without using a macro, by writing the memoization yourself inside the function, but you could not writedefine-memoizedwithout using a macro.Thanks.
MUMPS
When I first started programming, dealing with files could be a bit complicated. Then I started programming in MUMPS, and it became a non-issue, because in MUMPS everything is stored in a sparse, multi-dimensional array.
S ^PATIENTS(12,1)="SMITH, HARRY^M^12/22/1957^123-45-6789" ; Store Demographics (12 is the internal id) S ^PATIENTS(12,2)="123 45th Street^New York^New York^01234" ; Store Address S ^PINDEX("SMITH, HARRY",12)="" ; Name index S INDEX=$O(^PINDEX("SMITH, HARRY","")) Q:'INDEX ; Quit if no index for "SMITH, HARRY" S TXT=^PATIENTS(INDEX,1) W !,"NAME: ",$P(TXT,"^",1) S SEX=$P(TXT,"^",2) W !,"SEX: ",$S(SEX="M":"Male",SEX="F":"Female",1:"Unknown") W !,"DOB: ",$P(TXT,"^",3) W !,"SSN: ",$P(TXT,"^",4) ; S TXT=^PATIENTS(INDEX,2) W !,"ADDR: ",$P(TXT,"^",1) W !,"CITY: ",$P(TXT,"^",2) W !,"ST: ",$P(TXT,"^",3) W !,"ZIP: ",$P(TXT,"^",4)While I liked the way they handled storage, another feature was the ability to handle multiple tasks and multiple users on small systems. I heard of a fellow who had a side business he ran on a PC-AT (6 or 8 MHz 8086, < 3 MB RAM). It was for accounting and supported multiple simultaneous users. The only other language I have heard that had such a capability is Forth: I have actually heard of 1 Forth system which handled 50 tasks with a Z80 8-bit CPU.
Ah, hi.
The D programming language is, more or less, a simplified and regularized dialect of C++, and it may have had (through Andrei Alexandrescu, a D enthusiast and C++ committee member) significant influence on more modern versions of C++. In particular, the modern C++ “constexpr” keyword and compile-time execution of functions seem directly influenced by D.
To Schemers, metacircularity and hygienic macros are old hat. To those of us brought up in the C tradition, they are new and exciting. D provides the first by including a D interpreter in the D compiler, so that deterministic D code whose input is provably immutable can be executed and its result may be used as data in the compilation of the rest of the program. Just as a Scheme macro is only a Scheme function whose input is a list representing the abstract syntax tree of a program, so indeed are D functions executable by the compiler during the compilation. The following small program, when compiled with all optimizations disabled …
import std.stdio;private uint fib(in uint n) pure nothrow @nogc
{
if (n == 0) {
return 1;
}
return n * fib(n - 1);
}
private immutable uint f5 = fib(5); // 120, to be precise
int main()
{
stdout.writeln(f5);
return 0;
}
reduces to the equivalent program
import std.stdio;
int main()
{
stdout.writeln(120);
return 0;
}
Hygienic macros are not as easy to give small examples for, but a beautiful large example is found in D’s own standard library: in particular, the std.regex module, wherein typical messy regular expressions are compiled to D code and inlined thereafter using the facility above.
Yes, again, Scheme has done this for decades. It’s new and exciting for us plebeians though.
(Argh! I called the function above “fib” instead of “fac”! Please forgive me!)
In Python 3.2 they added the @lru_cache decorator (least recently used cache) which does function memoization (which until today I always read as memoRization) and allows the user to set a max size for the cache for long running processes. The docs use the Fibonacci sequence as their example. These ease of writing your own decorators is often an example I use as a non standard reason Python is great.
What I like about Python is the large number of libraries. An example (after Raymond Hettinger): a script that walks down a directory tree and lists all duplicate files (with the same content). I think Python is great, but there are, of course, many other great languages.
import os, hashlib, pprint, codecs, collections hashmap = collections.defaultdict(list) for path, dirs, files in os.walk('.'): for filename in files: fullname = os.path.join(path, filename) with open(fullname, errors='ignore') as f: d = codecs.encode(f.read(), 'utf-8', errors='ignore') h = hashlib.md5(d).hexdigest() hashmap[h].append(fullname) pprint.pprint([v for k, v in hashmap.items() if len(v) > 1])Personally I just love the concept of list comprehension. It facilitates a concise way of filtering and executing operations on the list members. Quite some programming languages support it. I mostly use it in Python now, had some fun with it in XQuery a few years back as well.
Specific to Python: if you change every defined list with [] to () in above example, everything becomes generators and gets computed lazily, i.e. the powers of 2 are computed first before the next number is even determined from the input string.
@Rutger: I like list comprehensions, too. That’s why I provide them in the Standard Prelude. Likewise pattern matching.
Using the Wolfram Language (also know as “Mathematica”), one can easily memoize functions. It’s even easier than the method shown above!
Let’s use the Fibonacci function as an example. A naive implementation,
fib, would look like:fib[0] := 1;
fib[1] := 1;
fib[n_] := fib[n - 1] + fib[n - 2];
Calculating the 30th Fibonacci number takes about 3 seconds on my machine. To memoize the function, we simply replace
fib[n_] := ...withfib[n_] := fib[n] = .... This gives usfib[0] := 1;
fib[1] := 1;
fib[n_] := fib[n] = fib[n - 1] + fib[n - 2];
Now computing the 30th Fibonacci number takes about 0.1 seconds on my machine (30x speedup). I like this idiom because of how easy it is to incorporate memoization into your code.