Sorting Without Duplicates
February 7, 2017
I think this is somebody’s homework problem:
Given an array of n integers, partition the array into sub-arrays, each in ascending order, and each without duplicates. For instance, given the array [2, 9, 1, 5, 1, 4, 9, 7, 2, 1, 4], the desired output is the array [1, 2, 4, 5, 7, 9, 1, 2, 4, 9, 1], where the intent is to have as many sub-arrays as the maximum frequency of any element, each sub-array as long as possible before starting the next sub-array of duplicates. If possible, perform the work in-place, in time either O(n) or O(n log n).
Your task is to solve the sorting problem described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
In Python. This solution is not in-place.
from collections import Counter def sortwdup(seq): c = Counter(seq) result = [[] for _ in range(max(c.values()) if c else 0)] for e in sorted(c.keys()): for L in result[:c[e]]: L.append(e) return [r for res in result for r in res]In Python. I believe it is O(n) (as Python counter dictionary takes n inserts) and the second for loop has exactly n elements.
from collections import Counter from collections import defaultdict example = [2, 9, 1, 5, 1, 4, 9, 7, 2, 1, 4] ctr = Counter(example) result = defaultdict(list) for v in ctr: for i in range(ctr[v]): result[i] += [v] print(result) # output #defaultdict(<class 'list'>, {0: [1, 2, 4, 5, 7, 9], 1: [1, 2, 4, 9], 2: [1]})@Rutger: You have to sort ctr in line 9, as the order is otherwise not defined. Try for example to multiply all entries of example with 101 and see what happens. (I get then [707, 404, 101, 505, 202, 909, 404, 101, 202, 909, 101])
A sort (could be in-place) followed by a reordering that is much like mergesort (but requires not only ordered input but a disjoint split, because I was not able to deal with ties correctly when combining the parts). It might be O(n log n), even in the worst case, and it might not be impossible to do in-place or with less space than here. The combination step attempts to pick a large element whenever possible; it seemed to work for many examples after I disallowed ties. This also is in Python.
from itertools import count def sortificate(xs): '''Return the elements of xs in a chain-list of maximally long strictly ascending chains.''' return chainify(sorted(xs)) def chainify(xs): '''Return the elements of an ascending list of elements in a chain-list of maximally long strictly ascending groups. Hopefully comparable to mergesort in computational complexity.''' if xs == [] or xs[0] == xs[-1]: return xs lo, hi = separate(xs) return combine(chainify(lo), chainify(hi)) def separate(xs): '''Split an ascending list of elements near the middle so every small element is stricly smaller than any large element -- not able to combine chain-lists correctly when there were ties, so ruled ties out. Precondition: a split point must exist.''' m = len(xs) // 2 for k in count(): lo, hi = m - k, m + k if xs[lo - 1] < xs[lo]: return xs[:lo], xs[lo:] elif xs[hi - 1] < xs[hi]: return xs[:hi], xs[hi:] def combine(xs, ys): '''Combine chain-lists xs, ys where every x is smaller than any y -- was not able to combine all chain-lists correctly when there were ties, so ruled ties out.''' m, n = len(xs), len(ys) rs, j, k = list(range(m + n)), 0, 0 for t in rs: if (k < n and (j == m or (0 < t and xs[j] <= rs[t - 1] < ys[k]))): rs[t] = ys[k] k += 1 else: rs[t] = xs[j] j += 1 return rs def test(xs): '''Check elements are there, print for ocular inspection''' rs = sortificate(xs) if sorted(xs) != sorted(rs): print("FAIL", xs, " ==> ", rs) else: print(xs, " ==> ", rs) test([2, 9, 1, 5, 1, 4, 9, 7, 2, 1, 4])I don’t think an O(n) solution is possible. If it was, you could use it to construct an O(n) sort of any array without ties, and apply it to arrays with ties by, for example, using the element’s original index to break them.
Anyway, because of the way its “transpose” function behaves, the Haskell solution is almost embarrassing: we can sort the array, group the elements, use transpose to turn it into a list of the correct subarrays, and concatenate the whole thing together:
Inspired by@Kevin another Python solution.
from collections import Counter from itertools import zip_longest def sortwdup(seq): items = [[k]*v for k, v in sorted(Counter(seq).items())] result = list(zip_longest(*items)) return [r for res in result for r in res if r is not None]Here’s a Haskell version in the REPL. It’s not in-place.
[…] the comments to the previous exercise, reader Kevin gave a beautiful solution in Haskell; it sorts the input items, groups them into […]
Uses ‘key’ function of the built in ‘list.sort()’ method to sort the list into the desired order, in-place.
The trick or insight is to generate a key for each item in the list that reflects the number of times that item has been seen as well as the item itself. For example, in the test sequence below the three 1’s would get keys: (0,1), (1,1), (2,1).
The dd is a ‘defaultdict’ that holds an ‘itertools.count()’ counter for each unique value in the list. The key function takes a list item and returns a tuple (next value of the counter associated with that item, item).
Complexity is O(n log n). (The key function is called once for each element in the list, O(n). Sorting is O(n log n)).
from collections import defaultdict from itertools import count def sortwithoutdups(lst): dd = defaultdict(count) lst.sort(key=lambda el:(next(dd[el]), el)) # test d = [2, 9, 1, 5, 1, 4, 9, 7, 2, 1, 4] print(d) sortwithoutdups(d) print(d) #output [2, 9, 1, 5, 1, 4, 9, 7, 2, 1, 4] [1, 2, 4, 5, 7, 9, 1, 2, 4, 9, 1]