Sorting By Frequency

January 5, 2018

Our first solution makes use of as much of the Standard Prelude as we can:

(define (sort-by-freq lt? xs)
  (mappend (lambda (x) (make-list (cdr x) (car x)))
    (sort (freq-lt? lt?)
      (uniq-c (eql? lt?)
        (sort lt? xs)))))

If you’re not familiar with it, mappend is like map except it uses append rather than cons to build the output. The auxiliary functions eql? and freq-lt? both take a lt? function and return functions that perform a comparison:

(define (eql? lt?)
  (lambda (a b)
    (not (or (lt? a b) (lt? b a)))))

(define (freq-lt? lt?)
  (lambda (a b)
    (or (< (cdr b) (cdr a))
        (and (not (< (cdr a) (cdr b)))
             (lt? (car a) (car b))))))

And here’s the result:

> (sort-by-freq < '(2 3 5 3 7 9 5 3 7))
(3 3 3 5 5 7 7 2 9)

This is sufficient for small inputs, but the initial sort takes time O(n log n) and the append takes quadratic time, so we need a better algorithm if we want to handle large inputs. We count the frequencies with a hash table in time O(n), then sort the counts in time O(m log m), where m is the number of unique items in the input, and finally write the output in time O(n); presumably m is small relative to n and the sort contributes little to the overall runtime, so the solution is nearly linear:

(define (sort-by-freq lt? xs)
  (let ((ht (make-eq-hashtable)))
    (do ((xs xs (cdr xs))) ((null? xs))
      (hashtable-update! ht (car xs) add1 0))
    (call-with-values
      (lambda () (hashtable-entries ht))
      (lambda (keys values)
        (let ((fs (reverse (sort (freq-lt? lt?)
                    (map cons (vector->list keys)
                              (vector->list values))))))
          (let loop ((x #f) (n 0) (fs fs) (zs (list)))
            (cond ((and (null? fs) (zero? n)) zs)
                  ((zero? n)
                    (loop (caar fs) (cdar fs) (cdr fs) zs))
                  (else (loop x (- n 1) fs (cons x zs))))))))))

Note that we re-used the auxiliary function freq-lt? from the first version of sort-by-freq. We build up the output back to front, using cons rather than append. Here’s the result:

> (sort-by-freq < '(2 3 5 3 7 9 5 3 7))
(3 3 3 5 5 7 7 2 9)

To my eye, the second solution is rather more complicated than the first solution, and I imagine that even though it is asymptotically superior, the crossover point where the second solution is faster than the first is fairly high (because both sort and append are built-ins and run quickly), though I didn’t measure it.

You can run the program at https://ideone.com/nHhPoO.

Posted by programmingpraxis

Filed in Exercises

8 Comments »

8 Responses to “Sorting By Frequency”

chaw said

January 5, 2018 at 11:17 AM

Here is an O(n log n) solution (with the usual hash-table caveat)
using R7RS Scheme and a couple of popular SRFIs (for hash-tables and
sorting).


(import (scheme base)
        (scheme write)
        (srfi 69)
        (srfi 95))

(define input-101 '(2 3 5 3 7 9 5 3 7))
(define output-101 '(3 3 3 5 5 7 7 2 9))

(define (counts-ht nums)
  (let ((ht (make-hash-table)))
    (for-each (lambda (n)
                (hash-table-update!/default ht n (lambda (v) (+ v 1)) 0))
              nums)
    ht))

(define (make-lt? nums)
  (let ((ht (counts-ht nums)))
    (lambda (a b)
      (let ((ca (hash-table-ref ht a))
            (cb (hash-table-ref ht b)))
        (or (> ca cb)
            (and (= ca cb)
                 (< a b)))))))

(display (equal? (sort input-101 (make-lt? input-101))
                 output-101))
(newline)

Pascal Bourguignon said

January 5, 2018 at 11:55 AM


;; The example in the specification calls for a stable sort, therefore
;; we will have to keep the index of the first representant and use it
;; as secondary key in the sort operation.


(defun stable-sort-by-frequency (sequence lessp &key (key (function identity)))
  "
Sorts the elements in the SEQUENCE based on the frequency of their KEY
and the position of their first representant in the SEQUENCE.
LESSP shall be an order function on REAL.
"
  (flet ((compute-frequencies (sequence key)
           (let ((table (make-hash-table))
                 (index -1))
             (map nil (lambda (item)
                        (incf index)
                        (let ((item-key (funcal key item)))
                          (unless (gethash item-key table)
                            (setf (gethash item-key table) (list index)))
                          (push item (gethash item-key table))))
               sequence)
             table))
         (wrap (table)
           (let ((frequencies (make-array (hash-table-count table)))
                 (i -1))
             (maphash (lambda (k v)
                        (declare (ignore k))
                        (let ((r (nreverse v)))
                          (setf (aref frequencies (incf i))
                                (cons (length (cdr r)) r))))
                      table)
             frequencies))
         (sort-frequencies (frequencies lessp)
           (sort frequencies
                 (lambda (a b)
                   (cond
                     ((funcall lessp (first a) (first b)) t)
                     ((funcall lessp (first b) (first a)) nil)
                     (t                                   (< (second a) (second b)))))))
         (unwrap (result frequencies)
           (if (listp result)
               (loop
                 :with current := result
                 :for (count nil . elements) :across frequencies
                 :do (replace current elements)
                     (setf current (nthcdr count current)))
               (loop
                 :with start := 0
                 :for (count nil . elements) :across frequencies
                 :do (replace result elements :start1 start)
                     (incf start count)))
           result))
    (unwrap sequence
            (sort-frequencies (wrap (compute-frequencies sequence key)) lessp))))

(assert (equalp (stable-sort-by-frequency (list 1 2 3 4 5 6 7)       '< :key (function oddp)) '(2 4 6 1 3 5 7)))
(assert (equalp (stable-sort-by-frequency (list 0 1 2 3 4 5 6 7 8)   '< :key (function oddp)) '(1 3 5 7 0 2 4 6 8)))
(assert (equalp (stable-sort-by-frequency (vector 2 3 5 3 7 9 5 3 7) '>) #(3 3 3 5 5 7 7 2 9)))

matthew said
January 5, 2018 at 3:39 PM
I don’t see that append or mappend should be quadratic. Classic implementation: scan forwards down list, but build up result in reverse.

Globules said

January 6, 2018 at 6:07 AM

Here’s a Haskell version. First we create a mapping from each list element to a count of how many time it occurs. We convert the map to its corresponding list of key/value pairs, which is sorted using a comparison function that ensures the order we want. Finally, each pair (x, n) in the sorted list is replaced by a list of n copies of x, then the result is concatenated.

import Data.Monoid ((<>))
import Data.Map.Strict (assocs, fromListWith)
import Data.List (sortBy)

-- Sort a list first in order of decreasing frequency of the elements, with
-- equal elements ordered by decreasing value.
freqSort :: Ord a => [a] -> [a]
freqSort = concatMap (\(x, n) -> replicate n x) .
           sortBy (\(a, b) (c, d) -> compare d b <> compare a c) .
           assocs .
           fromListWith (+) .
           flip zip (repeat 1)

main :: IO ()
main = print $ freqSort  [2, 3, 5, 3, 7, 9, 5, 3, 7]

$ ./freqsort 
[3,3,3,5,5,7,7,2,9]

Daniel said

January 6, 2018 at 6:47 AM

Here’s a solution in Python.

from collections import Counter

def sort_by_frequency(l):
    counts = Counter(l)
    key = lambda x: (-counts.get(x), x)
    return sorted(l, key=key)

l = [2, 3, 5, 3, 7, 9, 5, 3, 7]
print sort_by_frequency(l)

Output:

[3, 3, 3, 5, 5, 7, 7, 2, 9]

Paul said

January 7, 2018 at 12:42 PM

Another solution in Python. This one sorts the items of the Counter, not all items from the original array.

def sort_by_frequency(arr):
    c = sorted(Counter(arr).items(), key=lambda vn: (-vn[1], vn[0]))
    result = []
    for v, n in c:
        result += [v] * n
    return result

matthew said
January 7, 2018 at 8:27 PM
Here’s another Haskell solution then. Sort the list, group into sublists of identical element. Concatenate while wrapping each element in a tuple with the count of identical elements (this is negated so we can use the standard tuple ordering for sorting). Sort and drop the count. Seems to run in about the same time as Globules’ solution above, depending on the amount of duplication.
```
{-# LANGUAGE TupleSections #-}
import Data.List (sort,group)
f :: Ord a => [a] -> [a]
f = map snd . sort . concatMap h . group . sort where h s = map (-length s,) s
main = print $ f [2,3,5,3,7,9,5,3,7]
```

Eric said

January 26, 2018 at 1:44 PM

Here are two Python approaches with and without Counter respectively. Although less elegant, the latter is more fun trying to figure out.

Sol. 1 (with Counter)

from collections import Counter

def sort_by_freq(list_in):
    list = []
    for x,y in Counter(sorted(list_in)).most_common():
        list += [x]*y

    return list

l = [2, 3, 5, 3, 7, 9, 5, 3, 7]
print(sort_by_freq(l))

Sol.2 (without Counter)

def sort_by_freq(list_in):
    ll = len(list_in)
    freq = [list_in.count(i) for i in range(10)]
    numbf = [[i for i,x in enumerate(freq) if x == ll-j] for j in range(ll)]
    list = []
    for i in range(ll):
        list += sorted(numbf[-(ll-i)]*(ll-i))

    return list

l = [2, 3, 5, 3, 7, 9, 5, 3, 7]
print(sort_by_freq(l))

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Programming Praxis