Inversions

October 31, 2017

We use nested loops to create a higher-order procedure that processes inversions:

```(define (inversion lt? base proc xs)
(do ((xs xs (cdr xs))) ((null? (cdr xs)) base)
(do ((ys (cdr xs) (cdr ys))) ((null? ys))
(when (lt? (car ys) (car xs))
(set! base (proc (car xs) (car ys) base))))))```

Here `lt?` is a predicate that compares two items, `base` accumulates the result, `proc` adds the next inversion to the accumulator, and `xs` is the input array, represented as a list. The two needed functions are then:

```(define (inversion-count xs)
(inversion < 0 (lambda (x y base) (+ base 1)) xs))

(define (inversion-enumerate xs)
(inversion < (list) (lambda (x y base) (cons (list x y) base)) xs))```

Here are two examples:

```> (define xs '(1 4 3 2 5))
> (inversion-count xs)
3
> (inversion-enumerate xs)
((3 2) (4 2) (4 3))```

Our program takes O(n2) time and O(1) space. There exists an O(n log n) algorithm based on merge sorting, but I’ve never got it to work.

You can run the program at https://ideone.com/AyJvvf.

Pages: 1 2

12 Responses to “Inversions”

1. Paul said

In Python using a modified mergesort.

```def mergesort(arr):
inversions = 0
n = len(arr)
if n <= 1:
return arr, inversions
left, right = arr[:n//2], arr[n//2:]
sleft, invleft = mergesort(left)
sright, invright = mergesort(right)
merged = []
while sright and sleft:
if sleft <= sright:
merged.append(sleft.pop(0))
else:
merged.append(sright.pop(0))
inversions += len(sleft)
return merged + (sleft or sright), inversions + invleft + invright
```
2. Daniel said

I do not believe an O(n log n) solution is possible for the task of enumerating the inversions. For that task I’ve implemented an O(n^2) solution in Python.

```# inversions.py

def pairs(array):
return ((array[i], array[j])
for i in range(len(array))
for j in range(i, len(array)))

def enumerate_inversions(array):
return [(x, y) for x, y in pairs(array) if x > y]

array = [1, 4, 3, 2, 5]
inversions = enumerate_inversions(array)
print inversions
```
```[(4, 3), (4, 2), (3, 2)]
```

I was originally using itertools.combinations, but I wasn’t sure if that function is guaranteed to preserve the ordering of the original array (although it appears to).

```# inversions.py

from itertools import combinations

def enumerate_inversions(array):
return [(x, y) for x, y in combinations(array, 2) if x > y]

array = [1, 4, 3, 2, 5]
inversions = enumerate_inversions(array)
print inversions
```

For the task of counting the inversions, I’ve implemented an O(n log n) solution based on merge sort in C99.

```#include <stdio.h>

static int _inversions(int array[], int aux[], size_t n) {
if (n <= 1) return 0;
size_t mid = n / 2;
int left_count = _inversions(array, aux, mid);
int right_count = _inversions(array + mid, aux, n - mid);
for (size_t i = 0; i < n; ++i) {
aux[i] = array[i];
}
int count = left_count + right_count;
size_t i = 0;   // index for output
size_t j = 0;   // index for left array
size_t k = mid; // index for right array
while (j < mid && k < n) {
if (aux[j] <= aux[k]) {
array[i++] = aux[j++];
} else {
array[i++] = aux[k++];
count += mid - j;
}
}
while (j < mid) {
array[i++] = aux[j++];
}
while (k < n) {
array[i++] = aux[k++];
}
return count;
}

int inversions(int array[], size_t n) {
int aux[n];
return _inversions(array, aux, n);
}

int main(int argc, char* argv[]) {
int array[] = {1, 4, 3, 2, 5};
size_t n = sizeof(array) / sizeof(int);
int count = inversions(array, n);
printf("%d\n", count);
}
```

Output:

```3
```
3. Daniel said

@Paul, I believe that a core property of merge sort is its O(n log n) computational complexity. However, I think that by using pop(0) in your loop, you’re losing that property, since pop(0) is an O(n) operation, which would seemingly make your implementation O(n^2 log n).

4. Paul said

@Daniel: I think my original solution is still O(n log n), but you are right: it better to use pop() i.s.o. pop(0). See the code below. It is very fast, compared to the (n2) solution.

```def mergesort(arr):
inversions = 0
n = len(arr)
if n <= 1:
return arr, inversions
lft, rgt = arr[:n//2], arr[n//2:]
lft, invlft = mergesort(lft)
rgt, invrgt = mergesort(rgt)
merged = []
app = merged.append
lftpop = lft.pop
rgtpop = rgt.pop
lenrgt = len(rgt)
while lft and lenrgt:
if lft[-1] > rgt[-1]:
app(lftpop())
inversions += lenrgt
else:
app(rgtpop())
lenrgt -= 1
merged.reverse()
return (lft or rgt) + merged, inversions + invlft + invrgt
```
5. matthew said

If we want to enumerate the inversions, then adapting insertion sort is nice: each time we swap adjacent elements, this removes exactly one inversion:

```def inversions(a):
for i in range(1,len(a)):
for j in range(i,0,-1):
if a[j-1] <= a[j]: break
yield((a[j-1],a[j]))
a[j-1],a[j] = a[j],a[j-1]

print(list(inversions([1,4,3,2,5])))
```
6. Daniel said

“I think my original solution is still O(n log n)”

Hi @Paul, regarding your comment quoted above, I think that solution was O(n^2 log n). The complexity from the recursive calls is O(log n) from the divide-and-conquer approach. Each recursive call has a while loop, with a nested O(n) operation. If the while loop looped over all elements, but had a constant operation for its body, it would be O(n). The nested O(n) operation in the while loop seemingly makes the loop O(n^2). Multiplying this by the O(log n) from the recursive calls results in O(n^2 log n).

I’m not particularly confident in my analysis of algorithms. If possible, can you please elaborate on why you think that approach is O(n log n)?

7. matthew said

@Daniel: The recurrence here is T(1) = 1, T(n) = 2T(n/2) + n^2 and https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms) (see “Case 3 example”) suggests this is actually just n^2 (with an exact solution T(1) = 1, T(n) = n^2 – n otherwise).

8. matthew said

Sorry, what I wrote there is wrong: the exact solution at https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms)#Case_3_example is T(1) = 1, T(n) = 2n^2 – n (so T(2) = 22^2-2 = 6 = 2T(2/2)+2^2 = 2+4)

9. matthew said

Hmm, there were some asterisks for multiplication there that have disappeared and turned the text italic. Just read the Wikipedia entry and ignore me.

10. Paul said

Hi @Daniel. I did some timings (again) and see now that my 1st method gets really slow for n=10^6. It looks like O(n^2) from timings.
Timings confirm also that the 2nd method is O(n log n).
When I posted my 1st method I was aware of the fact that pop(0) is not optimal, but I did not think it was that bad. Immediately after my post I created the 2nd method, but I did not post it, as it is almost the same and I did not think that the time difference would be that big.

11. Paul said

Hi @Daniel. I wrote down why I think my 1st method is O(n^2) and the second O(n log n). You can find it here.

12. Daniel said

@Paul and @matthew. Thanks for your replies! I’ll have to switch to incorporating this approach into my future analysis of recursive algorithms (that way hopefully I won’t arrive at the wrong solution, like I did above).