Inversions
October 31, 2017
In an array A, the term inversions refers to the number of items that are out of order; each i < j for which A[j] < A[i] counts as a single inversion. For instance, the array [1 4 3 2 5] has three inversions: (4,3), (4,2) and (3,2). Inversions are a measure of the disorder of an array: an ordered array has no inversions and a reversed array has the maximum n(n−1)/2 number of inversions.
Your task is to write programs that count the number of inversions in an array and make an enumeration of them. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
In Python using a modified mergesort.
I do not believe an O(n log n) solution is possible for the task of enumerating the inversions. For that task I’ve implemented an O(n^2) solution in Python.
I was originally using itertools.combinations, but I wasn’t sure if that function is guaranteed to preserve the ordering of the original array (although it appears to).
For the task of counting the inversions, I’ve implemented an O(n log n) solution based on merge sort in C99.
Output:
@Paul, I believe that a core property of merge sort is its O(n log n) computational complexity. However, I think that by using pop(0) in your loop, you’re losing that property, since pop(0) is an O(n) operation, which would seemingly make your implementation O(n^2 log n).
@Daniel: I think my original solution is still O(n log n), but you are right: it better to use pop() i.s.o. pop(0). See the code below. It is very fast, compared to the (n2) solution.
If we want to enumerate the inversions, then adapting insertion sort is nice: each time we swap adjacent elements, this removes exactly one inversion:
Hi @Paul, regarding your comment quoted above, I think that solution was O(n^2 log n). The complexity from the recursive calls is O(log n) from the divide-and-conquer approach. Each recursive call has a while loop, with a nested O(n) operation. If the while loop looped over all elements, but had a constant operation for its body, it would be O(n). The nested O(n) operation in the while loop seemingly makes the loop O(n^2). Multiplying this by the O(log n) from the recursive calls results in O(n^2 log n).
I’m not particularly confident in my analysis of algorithms. If possible, can you please elaborate on why you think that approach is O(n log n)?
@Daniel: The recurrence here is T(1) = 1, T(n) = 2T(n/2) + n^2 and https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms) (see “Case 3 example”) suggests this is actually just n^2 (with an exact solution T(1) = 1, T(n) = n^2 – n otherwise).
Sorry, what I wrote there is wrong: the exact solution at https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms)#Case_3_example is T(1) = 1, T(n) = 2n^2 – n (so T(2) = 22^2-2 = 6 = 2T(2/2)+2^2 = 2+4)
Hmm, there were some asterisks for multiplication there that have disappeared and turned the text italic. Just read the Wikipedia entry and ignore me.
Hi @Daniel. I did some timings (again) and see now that my 1st method gets really slow for n=10^6. It looks like O(n^2) from timings.
Timings confirm also that the 2nd method is O(n log n).
When I posted my 1st method I was aware of the fact that pop(0) is not optimal, but I did not think it was that bad. Immediately after my post I created the 2nd method, but I did not post it, as it is almost the same and I did not think that the time difference would be that big.
Hi @Daniel. I wrote down why I think my 1st method is O(n^2) and the second O(n log n). You can find it here.
@Paul and @matthew. Thanks for your replies! I’ll have to switch to incorporating this approach into my future analysis of recursive algorithms (that way hopefully I won’t arrive at the wrong solution, like I did above).