Maximum Sum Subsequence

December 3, 2010

This exercise has been a favorite of programmers since Jon Bentley studied it in Chapter 8 of his book Programming Pearls. We’ll follow his pseudo-code as exactly as we can.

The cubic solution iterates over all pairs of integers i and j, with ij, over the length of the input sequence X, calculating the sum X[i .. j] and comparing it to the maximum sum so far:

(define (max-sum-subseq-1 xv)
  (let ((n (vector-length xv)) (max-so-far 0))
    (do ((i 0 (+ i 1))) ((= i n) max-so-far)
      (do ((j i (+ j 1))) ((= j n))
        (let ((sum 0))
          (do ((k i (+ k 1))) ((< j k))
            (set! sum (+ sum (vector-ref xv k))))
          (set! max-so-far (max max-so-far sum)))))))

There are two quadratic solutions. The first works by noticing that the sum of X[i .. j] is really easy to compute if you’ve just computed the sum of X[i .. j−1] in a loop:

(define (max-sum-subseq-2a xv)
  (let ((n (vector-length xv)) (max-so-far 0))
    (do ((i 0 (+ i 1))) ((= i n) max-so-far)
      (let ((sum 0))
        (do ((j i (+ j 1))) ((= j n))
          (set! sum (+ sum (vector-ref xv j)))
          (set! max-so-far (max max-so-far sum)))))))

The other quadratic algorithm pre-computes the cumulative sums of X[0 .. j] and calculates the sum of X[i .. j] by subtracting the sum of X[0 .. i] from the sum of X[0 .. j]:

(define (max-sum-subseq-2b xv)
  (define (cumarr x) (+ x 1))
  (let* ((n (vector-length xv))
         (real-array (make-vector (+ n 1) 0)))
    (vector-set! real-array (cumarr -1) 0)
    (do ((i 0 (+ i 1))) ((= i n))
      (vector-set! real-array (cumarr i)
        (+ (vector-ref real-array (cumarr (- i 1)))
           (vector-ref xv i))))
    (let ((max-so-far 0))
      (do ((i 0 (+ i 1))) ((= i n) max-so-far)
        (do ((j i (+ j 1))) ((= j n))
          (let ((sum (- (vector-ref real-array (cumarr j))
                        (vector-ref real-array (cumarr i)))))
            (set! max-so-far (max max-so-far sum))))))))

Bentley uses cumarr[-1] in his solution. But since no array has an index of -1, he proposes (in the solution of Problem 5) a realarray which is indexed as an offset of 1 from cumarr. We use the function cumarr to convert the index.

The O(n log n) solution is tricky. Split the input sequence into two equal halves. The maximum sum subsequence of the whole is either the maximum sum subsequence of the left half, or the maximum sum subsequence of the right half, or the maximum sum subsequence that crosses the two halves. The maximum sum subsequences of the two halves can be computed recursively, and the maximum sum subsequence that crosses the midpoint can be found by summing the largest subsequence that starts at the midpoint and reaches left with the largest subsequence that starts at the midpoint and reaches right. The bases of the recursion are 0 for a zero-element subsequence and the maximum of a one-element subsequence is either the value of the element itself, if the element is positive, or 0 if the element is negative:

(define (max-sum-subseq-3 xv)
  (let-syntax ((x (syntax-rules ()
                    ((x i) (vector-ref xv i)))))
    (let maxsum3 ((l 0) (u (- (vector-length xv) 1)))
      (cond ((> l u) 0) ((= l u) (max 0 (x l)))
      (else (let ((m (quotient (+ l u) 2)))
              (let ((lmax 0) (sum 0))
                (do ((i m (- i 1))) ((< i l))
                  (set! sum (+ sum (x i)))
                  (set! lmax (max lmax sum)))
                (let ((rmax 0) (sum 0))
                  (do ((i (+ m 1) (+ i 1))) ((< u i))
                    (set! sum (+ sum (x i)))
                    (set! rmax (max rmax sum)))
                  (max (+ lmax rmax) (maxsum3 l m)
                       (maxsum3 (+ m 1) u))))))))))

If your knowledge of Scheme is shallow rather than deep, that function looks a little bit mysterious. Let-syntax introduces a macro x that is defined so that (x i), for any i, is replaced by (vector-ref xv i); x is an abbreviation, and the replacement text is instituted at compile-time rather than run-time. Maxsum3 is a function defined by a named let; the function takes two arguments, l and u, and is initially called as (maxsum 0 (- (vector-length xv) 1)). Thus, maxsum3 is exactly the same as Bentley’s function. We normally use loop in the position occupied by maxsum, but loop is not a keyword, merely a convention. The rest of max-sum-subseq-3 should be familiar if you have been following our exercises.

The linear algorithm scans the sequence from left to right, never backtracking; the code is subtle, short, and very fast. At each element, the variable max-ending-here is recalculated by adding to it the value of the current element; if it ever becomes negative, it is reset to its initial value of 0:

(define (max-sum-subseq-4 xv)
  (let ((n (vector-length xv))
        (max-so-far 0)
        (max-ending-here 0))
    (do ((i 0 (+ i 1))) ((= i n) max-so-far)
      (set! max-ending-here (max (+ max-ending-here (vector-ref xv i)) 0))
      (set! max-so-far (max max-so-far max-ending-here)))))

Here is the linear algorithm again, using a list instead of a vector, the way it would normally be written in Scheme; it’s much prettier than the other way:

(define (max-sum-subsequence xs)
  (let loop ((xs xs) (max-ending-here 0) (max-so-far 0))
    (if (null? xs) max-so-far
      (let ((max-ending-here (max (+ max-ending-here (car xs)) 0)))
        (loop (cdr xs) max-ending-here (max max-ending-here max-so-far))))))

All versions of the program correctly compute the maximum sum subsequence of the sample data:

> (define xs '(31 -41 59 26 -53 58 97 -93 -23 84))
> (define xv #(31 -41 59 26 -53 58 97 -93 -23 84))
> (max-sum-subseq-1 xv)
187
> (max-sum-subseq-2a xv)
187
> (max-sum-subseq-2b xv)
187
> (max-sum-subseq-3 xv)
187
> (max-sum-subseq-4 xv)
187
> (max-sum-subsequence xs)
187

As an interview question, I would expect the candidate to correctly write one of the solutions, any of them, and to at least know about the linear-time solution, even if he couldn’t correctly write it on the whiteboard. Bentley is popular, and any programmer who read his book will remember that there is a linear solution, even if he can’t remember the solution during the interview. On the other hand, any programmer who doesn’t know about the solution will likely fail the interview, on the theory that if he isn’t at least interested enough in his profession to read Bentley, he isn’t interested enough in his profession to work for me.

You can run the program at http://programmingpraxis.codepad.org/eM3jPDSO.

About these ads

Pages: 1 2

15 Responses to “Maximum Sum Subsequence”

  1. [...] maximale — tel qu’exposé dans Programming Pearls de Jon Bentley et dans un exercice de Programming Praxis — consiste à trouver, parmi les sommes de tous les segments (sous-séquences continues) [...]

  2. [...] Praxis – Maximum Sum Subsequence By Remco Niemeijer In today’s Programming Praxis exercise, we have to implement four different ways of solving the problem of [...]

  3. My Haskell solution (see http://bonsaicode.wordpress.com/2010/12/03/programming-praxis-maximum-sum-subsequence/ for a version with comments):

    import Data.List
    
    maxSum1 :: (Ord a, Num a) => [a] -> a
    maxSum1 xs = maximum . map sum $ inits =<< tails xs
    
    maxSum2 :: (Ord a, Num a) => [a] -> a
    maxSum2 xs = maximum $ scanl (+) 0 =<< tails xs
    
    maxSum3 :: (Ord a, Num a) => [a] -> a
    maxSum3 = fst . head . until (null . tail) f . map (\x -> (x, [x])) where
        f ((lm, l):(rm, r):xs) = (maximum [maximum (scanl (+) 0 r) +
            maximum (scanr (+) 0 l), lm, rm], l ++ r) : f xs
        f xs = xs
    
    maxSum4 :: (Ord a, Num a) => [a] -> a
    maxSum4 = snd . foldl (\(here, sofar) x -> let m = max 0 $ here + x 
                                               in (m, max m sofar)) (0, 0)
    
  4. www said

    “if he isn’t at least interested enough in his profession to read Bentley, he isn’t interested enough in his profession to work for me”

    Why not just make that the opening criteria, then? Ask if they’ve read the book, ask how many chapters are there, ask when it was published. If the candidate doesn’t know these, stop the interview immediately.

    Are there any books (or combination of books) that are acceptable substitutes? What if the candidate read Code Complete three times? Does reading Norvig through override the necessity to have read Bentley?

  5. programmingpraxis said

    I seem to have struck a nerve; Remco had much the same comment on his blog.

    I just did a Google search for maximum sum subsequence bentley; there are 4500 results. There are several references to Bentley’s original book, of course. There are articles on Springer, ACM, IEEE, and Elsevier referring to Bentley’s book. There are course materials at multiple colleges (I stopped counting after half a dozen) that refer to Bentley’s book. TopCoder and UVA give the exercise. Several blogs mention Bentley’s book, including LtU. I conclude that the O(n) algorithm is part of the computing folklore.

    It is not unreasonable to expect a candidate to at least know of the algorithm, even if he can’t recall it exactly. A single such failing wouldn’t knock out a candidate, but it would make me ask another question from the computing folklore. Multiple such failures likely would knock out a candidate.

    Actually, I am more interested to find out what projects a candidate has worked on, what role he played in those projects, and how does that previous work match up to the job for which I am hiring. These “interview questions” play a much smaller role. But I still think a candidate not versed in the programming folklore will be less productive than one who is, and thus less suitable for hiring.

  6. Aaron said

    It was pretty obvious that there was a linear solution, so why would I waste the time on the other solutions, except for intellectual (academic) exercise? Why would I waste my time memorizing obscure algorithms when I know I can figure out the solution on my own or in a reference? Doesn’t seem like a good way to interview, unless you’re looking for Ph.d candidates. As Einstein said, “Never memorize what you can look up in books.”

  7. Carl said

    Oh, dear, Shaun G. That’s a bit clumsy and slow. Try this instead:


    $seq = array( 31, -41, 59, 26, -53, 58, 97, -93, -23, 84 );

    foreach( $seq as $num )
    {
    $current += $num;
    $highest = ( $highest < $current ? $current : $highest );
    $current = ( $current < 0 ? 0 : $current );
    }

  8. Mayson Lancaster said

    Your statement of the problem varies subtly, and somewhat ambiguously, from Bentley’s: his problem states that one of the valid subsequences is the empty sequence, which cleanly disposes of the edge case where all entries are negative.

  9. programmingpraxis said

    My comment about hiring caused quite a stir at Hacker News. As the good folks there pointed out, I may have spoken un-artfully, but there is an important point here that is being missed. I bet that every civil engineer, if asked at an interview, could explain the basic design of Galloping Gertie and the flaws that caused it to fail. I bet that every architect, if asked at an interview, could explain the basic design of the Twin Towers, and discuss why it was less resilient to heavy lateral impacts than other skyscrapers. I bet that every journalist, if asked at an interview, could quote some of the advice of Strunk and White.

    As a broad statement, certainly not always true, computer programmers seem to be less knowledgeable of their professional culture than other professions. There is no equivalent to Architectural Digest or Journal of Accountancy or Morbidity and Mortality for programmers, so programmers have to work harder to keep up with their profession. If you can demonstrate that at an interview, you have a leg up on the candidate who can’t.

    Do you have to know the linear solution to the maximum sum subsequence problem to get hired? Of course not. But if you know it, and the other guy doesn’t, then all other things being equal, you get hired, not him.

    I should also have mentioned that there must be some context to the programming questions being asked at an interview. There are probably better questions to ask a PHP web developer.

  10. Kevin said

    I don’t know if this is right but for some reason I cannot think of any other way to do this except linearly.
    I figured that I could generate all subsets and check their sums via a recursive function but I couldn’t figure out how to do that haha.
    As for the O(n*log n) algorithm…I couldn’t even think of a way to do that.

    using namespace std;
    
    int linear(vector<int> seq)
    {
    	int current = seq[0],
    		prev = seq[0],
    		max = seq[0];
    
    	for(int i=1; i<seq.size(); i++)
    	{
    		prev = current;
    		current += seq[i];
    		if(current < 0)
    			current = 0;
    		if(prev > max)
    			max = prev;
    		else if(current > max)
    			max = current;
    	}
    	return max;
    }
    
  11. F. Carr said

    AI have enjoyed reading the various problems discussed here at programmingpraxis. But a blogger who cannot see any way “forward” — except to re-entrench their ingrained biases and errors — after being caught out making such a ridiculous overstatement? There are many blogs where readers can fruitfully invest their time: multiple such failures on the part of one particular blogger, unable to learn a valuable lesson, are likely to knock them out of the marketplace of ideas.

  12. While I do have Bentley’s book on my shelf, it’s been long enough that I didn’t remember this specific example (perhaps that is a good indication that I should reread it again). I’d certainly expect that any reasonable programmer could come up with an implementation that worked: my own version was the obvious cubic time algorithm. A moment’s further thought reduced that a quadratic algorithm by changing storing the array of sums, allowing you to calculate the total sums between any two points as the difference between two values from the sum table. I’d expect any good programmer to be able to come up with that. Bentley’s linear time algorithm is actually way subtle, and it isn’t clear to me that it actually answers the question you asked: you asked to identify the sequence with maximal sum. All of your code seems to return the maximal sum. Furthermore, you forgot the stipulation that Bentley makes: that when all the entries are negative, you should return the maximal sum of 0. The maximal sum is well defined even if all the entries are negative: the obvious quadratic algorithms will return the proper answer, Bentley’s linear code does not, and gains some traction in simplicity from this stipulation.

  13. Jebb said

    Slightly disappointed nobody posted a Python solution, I was hoping I could find a more elegant example than what I came up with (verbose as I’ve made it, I might as well have written it in C).

    I couldn’t come up with the O(n log n) solution, even though I correctly figured out it had to be some sort of recursive, divide-and-conquer approach…

    def cubic(L):
    	'''Cubic solution: two loops, and the summing of the list
    	on top'''
    	if L == None:
    		return None
    	max_subsum = L[0]
    	max_subset = L[:1]
    	for j in range(1, len(L)):
    		for i in range(j):
    			s = sum(L[i:j])
    			if s > max_subsum:
    				max_subsum = s
    				max_subset = L[i:j]
    	return (max_subsum, max_subset)
    
    def quadratic(L):
    	'''Avoid using the O(n) 'sum' function within the double loop,
    	use a single addition instead'''
    	if L == None:
    		return None
    	max_subsum = L[0]
    	max_subset = L[:1]
    	for i in range(len(L)):
    		subsum = 0
    		for j in range(i,len(L)):
    			subsum += L[j]
    			if subsum > max_subsum:
    				max_subsum = subsum
    				max_subset = L[i:j+1]
    	return (max_subsum, max_subset)
    
    def linear(L):
    	'''The problem can also be seen as looking for a minimum and a maximum
    	of the function f(n)=sum(L[0:n]), with the minimum occuring at a lower
    	index than the maximum; we want to maximise f(n+m) - f(n)'''
    	if L == None:
    		return None
    	list_sum = 0
    	relevant_min, relevant_max = 0, 0
    	relevant_min_index, relevant_max_index = 0, 0
    	current_min, current_max = relevant_min, relevant_max
    	current_min_index, current_max_index = relevant_min_index, relevant_max_index
    	for i in range(len(L)):
    		list_sum += L[i]
    		if list_sum > current_max:
    			current_max = list_sum
    			current_max_index = i
    		if list_sum < current_min:
    			if relevant_max - relevant_min < current_max - current_min:
    				# Found a new best subset
    				relevant_max, relevant_min = current_max, current_min
    				relevant_max_index, relevant_min_index = current_max_index, current_min_index
    			# Now start looking for the next subset
    			current_min, current_max = list_sum, list_sum
    			current_max_index, current_min_index = i, i
    	# Now in case we finished the loop while the last subset was also the relevant one
    	if relevant_max - relevant_min < current_max - current_min:
    		relevant_max, relevant_min = current_max, current_min
    		relevant_max_index, relevant_min_index = current_max_index, current_min_index
    	return (relevant_max - relevant_min, L[relevant_min_index + 1: relevant_max_index + 1])
    
  14. [...] Here is my attempt at Programming Praxis’ Maximum Sum Subsequence problem [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 576 other followers

%d bloggers like this: