RIP Leibniz
November 18, 2016
We write the program first, then discuss the calculus. Here is a function that returns the number of comparisons required to determine that a sequence is not sorted; if the sequence is sorted, it will return the length of the sequence:
(define (sorted lt? xs) (let loop ((xs xs) (count 1)) (cond ((null? (cdr xs)) count) ((lt? (car xs) (cadr xs)) (loop (cdr xs) (+ count 1))) (else count))))
Then we apply sorted
to a million sequences and determine the average number of comparisons:
(define (comps n) (let loop ((i n) (cnt 0)) (if (zero? i) cnt (loop (- i 1) (+ cnt (sorted < (shuffle (range n)))))))) We usedrange
andshuffle
from theStandard Prelude
. Here's the answer:
> (comps 10000) 17186
Or about 1.72 comparisons per sequence. Collin Stedman shows the calculation of the average as 1/2 + 2/3 + 3/8 + 4/30 + 5/144 + 6/840 + … ≈ 1.71828 = e − 1, where the numerators are the positive integers and the denominators are n! + (n-1)!. You can look at Collin’s blog for the math, which is actually quite pretty.
You can run the program at http://ideone.com/vmiDIf.
In python:
# outputs: 1.717841
Parameter list_size seems less relevant for increasing value, as the chance that the entire list would be sorted approaches 0.
Instead, sampling the number of random tests to determine that the characters in a given text (a command-line argument to the script) are not in order. Also in Python.
Randomness in Haskell is not my forte, so I went with a brute force approach; given a list,
calculate the number of comparisons needed for each permutation of that list and take the
average. I used the foldl package, but everything else is in the standard library.
We can save ourselves a lot of work by only randomizing the array as far as the first inversion, here we use a standard Fisher-Yates shuffle, but stop exchanges once an inversion is discovered. As Rutger observes, we don’t need a very large array for the probability of getting to the end to become negligible (we print out the distribution here as well as the result). Javascript since I’m trying to get into ES6 at the moment:
And in a similar vein, like Graham’s solution (shame not to get monads in there, as Leibniz seems to have invented the term), we can generate permutations and count the number of comparisons – but we only need to find all possible initial segments of permutations with a single inversion at the end of the segment, then if the initial segment has length n and there are N elements altogether, this actually counts as (N-n)! different permutations – for N = 10, we actually look at 4097 initial segments, rather than 10! = 3628800 full permutations.
For N = 4 we get:
and for N = 19, we get a final result of 1.7182818284590453, about as good as it gets with 64-bit floats.
In Python. Sequences longer than 10 are not really useful, as the inversion only very rarely happens with an index higher than 9.
@Paul: nice, my figures ought to agree with your figures, but they don’t – the problem is in my code, I don’t take account of the single sorted permutation, which we can say requires N comparisons (even if it really only needs N-1). We now converge a bit faster, but still need to go up to N=17 to get full precision:
@matthew. I agree. I worked out too that you would need to go to 17 for full precision. My Python code is too slow to go to 17 though.
@Paul: We can’t “skip” permutations with the Python itertools function, but we can with a C++-style next_permutation function – having found an initial segment ending in an inversion, we can change the permutation to be the lexically last permutation with that prefix – since we start with the lexically first, we can just reverse the rest of the permutation, as in this C++:
Even for N=20 we only need to look at 9437186 permutations rather than the full 20! = 2432902008176640000, which would take a while:
Here is my take on this challenge in Julia. No assumptions are made about the length of the random sequence, or the number of runs for the simulation. However, it appears that the average number of comparison is independent of the sequence length and probably converges to a particular value.
function leibniz(X::Array{Int64, 1})
# X is a random sequence of integers that we wish to determine if it is sorted
c = 0
for i = 2:length(X)
c += 1
if X[i-1] > X[i]
break
end
end
return c
end
function main(n::Int64, N::Int64 = 1000)
# wrapper for leibniz(), for a simulation of N runs
z = 0
for i = 1:N
x = rand(Int64, n)
z += leibniz(x)
end
return z / N
end
julia>main(10, 1000) # try out 1000 random sequences, each one of length 10, and find the average number of comparisons required
1.718
BTW, Happy Thanksgiving everyone!
@Zack, that’s interesting. Others here are interpreting “random sequence” as a random permutation, like a shuffle of a range. You just generate integers and seem to get the same results. But you generate from a large set. The important thing seems to be the low probability of ties. So low that it doesn’t really matter. This is rather under-stated in the problem, as far as I can see, but apparently intended, so that the result is related to a specific constant.
Here, I modified your code to take float arrays (really any numeric arrays), and to use (in-place) shuffles of a given randomly-generated array:
And I compute the results with samples from different distributions, first with practically no ties, then rounded to integers so that there will be many ties, and now the ties are seen to affect the result:
@matthew: nice and fast trick. I build a version in Python, that works a lot faster, than my earlier version. t took me a while to get it right.
@Paul: thanks, I tried converting my solution above into Python – I seemed to have missed a check that i > 0, does no harm in the Javascript, but not good in Python where we end up indexing from the end of the array. We can also get more significant figures by using big integers:
Looking at the maths: there are n! sequences of n items and there are n!/(n-k)! different initial sequences of k items of which n!/(k!(n-k)!) are ordered (ie. n choose k); each initial sequence appears (n-k)! times in the list of all sequences. A sequence with k ordered items initially needs at least k comparisons, if the first k items are not ordered, then at most k-1 comparisons are needed. Putting all this together: the proportion of all sequences that require k or more comparisons is (n!/(k!(n-k)!))/(n!/(n-k)!, which is just 1/k! and the proportion that require exactly k comparisons is then just 1/k! – 1/(k+1)! = k/(k+1)! and expected number of comparisons is sum of k²/(k+1)! Still not quite sure why the sum is e-1, but working on it.
@matthew. PP metions the blog of Colin Stedman. If you take formula 5 (where btw all the n’s should be i’s), you can easily calculate the result. A good trick is to write i^2 as i*(i+1) – (i+1) + 1. This leads to the sum of 3 terms: t1-t2+t3 = e – (e-1) + (e-2) = e-1 if you let go n to infinity.
@Paul: Aha, that identity for i^2 is just what I want, thanks. I’d looked at Collin’s page, but not entirely followed the reasoning (hence my reworking up above).
Outputs:
4: 1.666927
5: 1.706343
6: 1.714978
7: 1.719484
8: 1.717933
9: 1.715861
10: 1.717179
11: 1.718295
12: 1.717068
13: 1.718003
14: 1.718197
15: 1.718366
16: 1.718356
17: 1.717484
18: 1.719986
19: 1.718612
20: 1.718551
Forgot to mention that the above solution is in Ruby.
(English isn’t my first language, so please excuse any mistakes. I have used Unicode character U+222B for the integrals, I hope the comment system will not garble it.)
A mathematical solution that actually uses integrals and that is (more or less) rigorous (whereas Collin Stedman, in his nice article, assumes without proof that a certain sequence matches a certain formula):
Let S be the length of the nondecreasing initial subsequence of a sequence of random numbers taken from the interval [0, 1]. Thus, if the sequence is
0.2 0.1 0.3 …
then S is 1, and if the sequence is
0.2 0.25 0.7 0.5 …
then S is 3. S equals the number of comparisons that it takes to determine that the sequence is not sorted. The problem is to compute E(S), the expected value of S.
First, condition E(S) on the value of the first element of the sequence. Imagine that the interval [0, 1] is divided into a very large number of very small subintervals of length dx. Since the value of each element is assumed to be uniformly distributed on [0, 1], the probability that the first element will lie in some subinterval [x, x + dx] is just dx. Hence,
E(S) = ∫(x from 0 to 1) E(S | first element is x) Prob(first element lies in [x, x + dx])
= ∫(x from 0 to 1) E(S | first element is x) dx. [*]
Let f(x) be a shorthand for E(S | first element is x). The reason I introduce this function is that, using recursion as an inspiration, a relatively simple equation can be written down for it. The key idea is to look at the next (that is, second) element of the sequence and condition f(x) on its value as compared to the value of the first element, that is, x. Thus,
f(x) = ∫(y from 0 to 1) f(x | second element is y) Prob(second element lies in [y, y + dy])
= ∫(y from 0 to 1) f(x | second element is y) dy.
Now there are two possibilities. Either x <= y, and then f(x | second element is y) equals one plus the expected value of S for the _rest_ of the sequence, given that _its_ first value is y. But this last expected value is, by definition, f(y). So this possibility contributes to f(x) an amount of
∫(y from x to 1) (1 + f(y)) dy
= 1 – x + ∫(y from x to 1) f(y) dy.
(Note the integration limits: x y, then the expected value of S is, of course, 1, so this contributes to f(x) an amount of
∫(y from 0 to x) 1 dy
= x.
(Note again the integration limits.) Hence, we have
f(x) = 1 – x + ∫(y from x to 1) f(y) dy + x
= 1 + ∫(y from x to 1) f(y) dy.
You could say that this is the analytical form of the function
(let loop ((xs xs) (count 1))
(cond ((null? (cdr xs)) count)
((lt? (car xs) (cadr xs))
(loop (cdr xs) (+ count 1)))
(else count))))
from the webmaster’s suggested solution, and especially of the line
(loop (cdr xs) (+ count 1))).
The equation is a so-called linear Volterra integral equation. Now I don’t know much about integral equations, but I do know one solution technique and it happens to work in this case: take the Laplace transform of both sides, solve for the transform of the unknown function, and then take the inverse Laplace transform. To do this here, it is necessary first to rewrite the integral so that the lower limit of integration becomes zero:
f(x) = 1 + ∫(y from 0 to 1) f(y) dy – ∫(y from 0 to x) f(y) dy.
That second term seems pesky, but if you look back at equation [*], it turns that it equals E(S) itself. So the integral equation becomes
f(x) = 1 + E(S) – ∫(y from 0 to x) f(y) dy.
Now take the Laplace transform and call the transform of f(x), F(p). If, in the transform of the last term, you change the order of integration, it simplifies nicely to F(p) / p. So we get
F(p) = (1 + E(S)) / p + F(p) / p.
Solving for F(p) gives
F(p) = (1 + E(S)) / (p + 1).
Taking the inverse transform gives
f(y) = (1 + E(S)) exp(-y).
(Aha, there’s our friend exp!) Now, returning once more to equation [*], we get
E(S) = ∫(x from 0 to 1) f(x) dx
= ∫(x from 0 to 1) (1 + E(S)) exp(-x) dx
= (1 + E(S)) (1 – 1/e).
Finally, we solve for E(S) and get
E(S) = e – 1.
In my previous post, the less-than-or-equal-sign was apparently interpreted as a tag and this caused a part of the text to be removed. Second attempt:
(English isn’t my first language, so please excuse any mistakes. I have used Unicode character U+222B for the integrals, I hope the comment system will not garble it.)
A mathematical solution that actually uses integrals and that is (more or less) rigorous (whereas Collin Stedman, in his nice article, assumes without proof that a certain sequence matches a certain formula):
Let S be the length of the nondecreasing initial subsequence of a sequence of random numbers taken from the interval [0, 1]. Thus, if the sequence is
0.2 0.1 0.3 …
then S is 1, and if the sequence is
0.2 0.25 0.7 0.5 …
then S is 3. S equals the number of comparisons that it takes to determine that the sequence is not sorted. The problem is to compute E(S), the expected value of S.
First, condition E(S) on the value of the first element of the sequence. Imagine that the interval [0, 1] is divided into a very large number of very small subintervals of length dx. Since the value of each element is assumed to be uniformly distributed on [0, 1], the probability that the first element will lie in some subinterval [x, x + dx] is just dx. Hence,
E(S) = ∫(x from 0 to 1) E(S | first element is x) Prob(first element lies in [x, x + dx])
= ∫(x from 0 to 1) E(S | first element is x) dx. [*]
Let f(x) be a shorthand for E(S | first element is x). The reason I introduce this function is that, using recursion as an inspiration, a relatively simple equation can be written down for it. The key idea is to look at the next (that is, second) element of the sequence and condition f(x) on its value as compared to the value of the first element, that is, x. Thus,
f(x) = ∫(y from 0 to 1) f(x | second element is y) Prob(second element lies in [y, y + dy])
= ∫(y from 0 to 1) f(x | second element is y) dy.
Now there are two possibilities. Either x is less than or equal to y, and then f(x | second element is y) equals one plus the expected value of S for the _rest_ of the sequence, given that _its_ first value is y. But this last expected value is, by definition, f(y). So this possibility contributes to f(x) an amount of
∫(y from x to 1) (1 + f(y)) dy
= 1 – x + ∫(y from x to 1) f(y) dy.
(Note the integration limits: ‘x less than or equal to y’ means that y runs from x to the maximum value of 1.) If, on the other hand, x > y, then the expected value of S is, of course, 1, so this contributes to f(x) an amount of
∫(y from 0 to x) 1 dy
= x.
(Note again the integration limits.) Hence, we have
f(x) = 1 – x + ∫(y from x to 1) f(y) dy + x
= 1 + ∫(y from x to 1) f(y) dy.
You could say that this is the analytical form of the function
(let loop ((xs xs) (count 1))
(cond ((null? (cdr xs)) count)
((lt? (car xs) (cadr xs))
(loop (cdr xs) (+ count 1)))
(else count))))
from the webmaster’s suggested solution, and especially of the line
(loop (cdr xs) (+ count 1))).
The equation is a so-called linear Volterra integral equation. Now I don’t know much about integral equations, but I do know one solution technique and it happens to work in this case: take the Laplace transform of both sides, solve for the transform of the unknown function, and then take the inverse Laplace transform. To do this here, it is necessary first to rewrite the integral so that the lower limit of integration becomes zero:
f(x) = 1 + ∫(y from 0 to 1) f(y) dy – ∫(y from 0 to x) f(y) dy.
That second term seems pesky, but if you look back at equation [*], it turns that it equals E(S) itself. So the integral equation becomes
f(x) = 1 + E(S) – ∫(y from 0 to x) f(y) dy.
Now take the Laplace transform and call the transform of f(x), F(p). If, in the transform of the last term, you change the order of integration, it simplifies nicely to F(p) / p. So we get
F(p) = (1 + E(S)) / p + F(p) / p.
Solving for F(p) gives
F(p) = (1 + E(S)) / (p + 1).
Taking the inverse transform gives
f(y) = (1 + E(S)) exp(-y).
(Aha, there’s our friend exp!) Now, returning once more to equation [*], we get
E(S) = ∫(x from 0 to 1) f(x) dx
= ∫(x from 0 to 1) (1 + E(S)) exp(-x) dx
= (1 + E(S)) (1 – 1/e).
Finally, we solve for E(S) and get
E(S) = e – 1.
@Johan: nice analysis. My calculus is very rusty (and wasn’t that great to start with), but can we do some trick like differentiating both sides of:
(if my embedded latex comes out).
https://en.support.wordpress.com/latex/ by the way.
@matthew: thank you, and you are perfectly right about the differentiation trick. It leads to a simple differential equation for
and from there to the solution
, using the initial value
that follows from the integral equation. It also obviates the need to rewrite the integral equation and to recognize that ‘pesky term’ as $E(S)$. I have a habit of making such calculations more complex than necessary, although in this case it might have been my fondness for the Laplace transform that made me overlook the simpler solution. :-)
@Johan: Well, going the more complicated route is often more educational.