## Sorting By Frequency

### January 5, 2018

Our first solution makes use of as much of the Standard Prelude as we can:

(define (sort-by-freq lt? xs) (mappend (lambda (x) (make-list (cdr x) (car x))) (sort (freq-lt? lt?) (uniq-c (eql? lt?) (sort lt? xs)))))

If you’re not familiar with it, `mappend`

is like `map`

except it uses `append`

rather than `cons`

to build the output. The auxiliary functions `eql?`

and `freq-lt?`

both take a `lt?`

function and return functions that perform a comparison:

(define (eql? lt?) (lambda (a b) (not (or (lt? a b) (lt? b a)))))

(define (freq-lt? lt?) (lambda (a b) (or (< (cdr b) (cdr a)) (and (not (< (cdr a) (cdr b))) (lt? (car a) (car b))))))

And here’s the result:

> (sort-by-freq < '(2 3 5 3 7 9 5 3 7)) (3 3 3 5 5 7 7 2 9)

This is sufficient for small inputs, but the initial sort takes time O(*n* log *n*) and the append takes quadratic time, so we need a better algorithm if we want to handle large inputs. We count the frequencies with a hash table in time O(*n*), then sort the counts in time O(*m* log m), where *m* is the number of unique items in the input, and finally write the output in time O(*n*); presumably *m* is small relative to *n* and the sort contributes little to the overall runtime, so the solution is nearly linear:

(define (sort-by-freq lt? xs) (let ((ht (make-eq-hashtable))) (do ((xs xs (cdr xs))) ((null? xs)) (hashtable-update! ht (car xs) add1 0)) (call-with-values (lambda () (hashtable-entries ht)) (lambda (keys values) (let ((fs (reverse (sort (freq-lt? lt?) (map cons (vector->list keys) (vector->list values)))))) (let loop ((x #f) (n 0) (fs fs) (zs (list))) (cond ((and (null? fs) (zero? n)) zs) ((zero? n) (loop (caar fs) (cdar fs) (cdr fs) zs)) (else (loop x (- n 1) fs (cons x zs))))))))))

Note that we re-used the auxiliary function `freq-lt?`

from the first version of `sort-by-freq`

. We build up the output back to front, using `cons`

rather than `append`

. Here’s the result:

> (sort-by-freq < '(2 3 5 3 7 9 5 3 7)) (3 3 3 5 5 7 7 2 9)

To my eye, the second solution is rather more complicated than the first solution, and I imagine that even though it is asymptotically superior, the crossover point where the second solution is faster than the first is fairly high (because both `sort`

and `append`

are built-ins and run quickly), though I didn’t measure it.

You can run the program at https://ideone.com/nHhPoO.

Here is an O(n log n) solution (with the usual hash-table caveat)

using R7RS Scheme and a couple of popular SRFIs (for hash-tables and

sorting).

I don’t see that append or mappend should be quadratic. Classic implementation: scan forwards down list, but build up result in reverse.

Here’s a Haskell version. First we create a mapping from each list element to a count of how many time it occurs. We convert the map to its corresponding list of key/value pairs, which is sorted using a comparison function that ensures the order we want. Finally, each pair (x, n) in the sorted list is replaced by a list of n copies of x, then the result is concatenated.

Here’s a solution in Python.

Output:

Another solution in Python. This one sorts the items of the Counter, not all items from the original array.

Here’s another Haskell solution then. Sort the list, group into sublists of identical element. Concatenate while wrapping each element in a tuple with the count of identical elements (this is negated so we can use the standard tuple ordering for sorting). Sort and drop the count. Seems to run in about the same time as Globules’ solution above, depending on the amount of duplication.

Here are two Python approaches with and without Counter respectively. Although less elegant, the latter is more fun trying to figure out.

Sol. 1 (with Counter)

Sol.2 (without Counter)