Inversions

October 31, 2017

In an array A, the term inversions refers to the number of items that are out of order; each i < j for which A[j] < A[i] counts as a single inversion. For instance, the array [1 4 3 2 5] has three inversions: (4,3), (4,2) and (3,2). Inversions are a measure of the disorder of an array: an ordered array has no inversions and a reversed array has the maximum n(n−1)/2 number of inversions.

Your task is to write programs that count the number of inversions in an array and make an enumeration of them. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

12 Responses to “Inversions”

  1. Paul said

    In Python using a modified mergesort.

    def mergesort(arr):
        inversions = 0
        n = len(arr)
        if n <= 1:
            return arr, inversions
        left, right = arr[:n//2], arr[n//2:]
        sleft, invleft = mergesort(left)
        sright, invright = mergesort(right)
        merged = []
        while sright and sleft:
            if sleft[0] <= sright[0]:
                merged.append(sleft.pop(0))
            else:
                merged.append(sright.pop(0))
                inversions += len(sleft)
        return merged + (sleft or sright), inversions + invleft + invright
    
  2. Daniel said

    I do not believe an O(n log n) solution is possible for the task of enumerating the inversions. For that task I’ve implemented an O(n^2) solution in Python.

    # inversions.py
    
    def pairs(array):
        return ((array[i], array[j])
                for i in range(len(array))
                for j in range(i, len(array)))
    
    def enumerate_inversions(array):
        return [(x, y) for x, y in pairs(array) if x > y]
    
    array = [1, 4, 3, 2, 5]
    inversions = enumerate_inversions(array)
    print inversions
    
    [(4, 3), (4, 2), (3, 2)]
    

    I was originally using itertools.combinations, but I wasn’t sure if that function is guaranteed to preserve the ordering of the original array (although it appears to).

    # inversions.py
    
    from itertools import combinations
    
    def enumerate_inversions(array):
        return [(x, y) for x, y in combinations(array, 2) if x > y]
    
    array = [1, 4, 3, 2, 5]
    inversions = enumerate_inversions(array)
    print inversions
    

    For the task of counting the inversions, I’ve implemented an O(n log n) solution based on merge sort in C99.

    #include <stdio.h>
    
    static int _inversions(int array[], int aux[], size_t n) {
        if (n <= 1) return 0;
        size_t mid = n / 2;
        int left_count = _inversions(array, aux, mid);
        int right_count = _inversions(array + mid, aux, n - mid);
        for (size_t i = 0; i < n; ++i) {
            aux[i] = array[i];
        }
        int count = left_count + right_count;
        size_t i = 0;   // index for output
        size_t j = 0;   // index for left array
        size_t k = mid; // index for right array
        while (j < mid && k < n) {
            if (aux[j] <= aux[k]) {
                array[i++] = aux[j++];
            } else {
                array[i++] = aux[k++];
                count += mid - j;
            }
        }
        while (j < mid) {
            array[i++] = aux[j++];
        }
        while (k < n) {
            array[i++] = aux[k++];
        }
        return count;
    }
    
    int inversions(int array[], size_t n) {
        int aux[n];
        return _inversions(array, aux, n);
    }
    
    
    int main(int argc, char* argv[]) {
        int array[] = {1, 4, 3, 2, 5};
        size_t n = sizeof(array) / sizeof(int);
        int count = inversions(array, n);
        printf("%d\n", count);
    }
    

    Output:

    3
    
  3. Daniel said

    @Paul, I believe that a core property of merge sort is its O(n log n) computational complexity. However, I think that by using pop(0) in your loop, you’re losing that property, since pop(0) is an O(n) operation, which would seemingly make your implementation O(n^2 log n).

  4. Paul said

    @Daniel: I think my original solution is still O(n log n), but you are right: it better to use pop() i.s.o. pop(0). See the code below. It is very fast, compared to the (n2) solution.

    def mergesort(arr):
        inversions = 0
        n = len(arr)
        if n <= 1:
            return arr, inversions
        lft, rgt = arr[:n//2], arr[n//2:]
        lft, invlft = mergesort(lft)
        rgt, invrgt = mergesort(rgt)
        merged = []
        app = merged.append
        lftpop = lft.pop
        rgtpop = rgt.pop
        lenrgt = len(rgt)
        while lft and lenrgt:
            if lft[-1] > rgt[-1]:
                app(lftpop())
                inversions += lenrgt
            else:
                app(rgtpop())
                lenrgt -= 1
        merged.reverse()
        return (lft or rgt) + merged, inversions + invlft + invrgt
    
  5. matthew said

    If we want to enumerate the inversions, then adapting insertion sort is nice: each time we swap adjacent elements, this removes exactly one inversion:

    def inversions(a):
        for i in range(1,len(a)):
            for j in range(i,0,-1):
                if a[j-1] <= a[j]: break
                yield((a[j-1],a[j]))
                a[j-1],a[j] = a[j],a[j-1]
    
    print(list(inversions([1,4,3,2,5])))
    
  6. Daniel said

    “I think my original solution is still O(n log n)”

    Hi @Paul, regarding your comment quoted above, I think that solution was O(n^2 log n). The complexity from the recursive calls is O(log n) from the divide-and-conquer approach. Each recursive call has a while loop, with a nested O(n) operation. If the while loop looped over all elements, but had a constant operation for its body, it would be O(n). The nested O(n) operation in the while loop seemingly makes the loop O(n^2). Multiplying this by the O(log n) from the recursive calls results in O(n^2 log n).

    I’m not particularly confident in my analysis of algorithms. If possible, can you please elaborate on why you think that approach is O(n log n)?

  7. matthew said

    @Daniel: The recurrence here is T(1) = 1, T(n) = 2T(n/2) + n^2 and https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms) (see “Case 3 example”) suggests this is actually just n^2 (with an exact solution T(1) = 1, T(n) = n^2 – n otherwise).

  8. matthew said

    Sorry, what I wrote there is wrong: the exact solution at https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms)#Case_3_example is T(1) = 1, T(n) = 2n^2 – n (so T(2) = 22^2-2 = 6 = 2T(2/2)+2^2 = 2+4)

  9. matthew said

    Hmm, there were some asterisks for multiplication there that have disappeared and turned the text italic. Just read the Wikipedia entry and ignore me.

  10. Paul said

    Hi @Daniel. I did some timings (again) and see now that my 1st method gets really slow for n=10^6. It looks like O(n^2) from timings.
    Timings confirm also that the 2nd method is O(n log n).
    When I posted my 1st method I was aware of the fact that pop(0) is not optimal, but I did not think it was that bad. Immediately after my post I created the 2nd method, but I did not post it, as it is almost the same and I did not think that the time difference would be that big.

  11. Paul said

    Hi @Daniel. I wrote down why I think my 1st method is O(n^2) and the second O(n log n). You can find it here.

  12. Daniel said

    @Paul and @matthew. Thanks for your replies! I’ll have to switch to incorporating this approach into my future analysis of recursive algorithms (that way hopefully I won’t arrive at the wrong solution, like I did above).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: