Sorting Without Duplicates
February 7, 2017
My first thought was to build up a frequency table of the items in the array, perhaps in an AVL tree, then scan the tree to produce the desired output. But that doesn’t meet the requirement to perform the work in-place, and it won’t have O(n) time complexity.
Thus, my suggestion is to sort the array by any desired method, using some kind of counting sort or radix sort to get O(n) time complexity, or any O(n log n) sorting algorithm if you don’t care about that, then scan the array, moving duplicates to the end. When the first portion of the array is sorted without duplicates, reverse the remainder of the array and process that part of the array recursively, stopping when there is no sub-array left to process. That’s a sort followed by a scan, so it’s linear if the sort is linear. The trick is the “moving duplicates to the end” part, which can easily become non-linear. It works better with lists, which even if they’re not in-place at least don’t require more storage than the original. Function
partition scans a sorted input to find the first partition, returning it along with the sorted remainder; function
sort-dups handles one partition after another, appending them at the end:
(define (partition lt? xs) ; assumes sorted xs (let loop ((xs xs) (prev #f) (ys (list)) (zs (list))) (cond ((null? xs) (values (reverse ys) (reverse zs))) ((equal? (car xs) prev) (loop (cdr xs) prev ys (cons (car xs) zs))) (else (loop (cdr xs) (car xs) (cons (car xs) ys) zs)))))
(define (sort-dups lt? xs) (let loop ((xs xs) (zs (list))) (if (null? xs) (apply append (reverse zs)) (call-with-values (lambda () (partition lt? xs)) (lambda (first rest) (loop rest (cons first zs)))))))
And here is the function at work on some sample data:
> (sort-dups < (sort < '(2 9 1 5 1 4 9 7 2 1 4))) (1 2 4 5 7 9 1 2 4 9 1) > (sort-dups < (sort < '(1 1 1 1 1 1 1 1 1))) (1 1 1 1 1 1 1 1 1) > (sort-dups < (sort < '())) () > (sort-dups < (sort < '(7 3 1 6 2 5 4))) (1 2 3 4 5 6 7)
Time complexity of
sort-dups is O(n2) in the worst case, where all the integers are identical, but O(n log n) in the average case. I don’t see any way to reduce that, but maybe one of my smart readers has a better idea. I thought of using a hash table to collect the frequencies, but you still have to sort the keys, which makes the problem O(n log n), and you still have to deal with the case of identical integers.
You can run the program at http://ideone.com/j0tZR8.
Pages: 1 2