Second Largest Item

June 2, 2017

Any solution based on sorting will require O(n log n) comparisons, which is too many. Any solution based on heaps will require O(n log n) comparisons to build the heap plus O(2 log n) comparisons to extract the second largest, which is better but still not good enough. Selection sort uses O(2n – 3) comparisons regardless of the initial order of the data. Our solution uses O(2n – 3) comparisons in the worst case, just as selection sort, but only O(n + log n) in the expected case; I don’t know of a better algorithm:

(define (second-max lt? xs)
  (if (or (null? xs) (null? (cdr xs)))
      (error 'second-max "not enough data")
      (begin
        (when (lt? (car xs) (cadr xs))
          (set! xs (cons (cadr xs) (cons (car xs) (cddr xs)))))
        (let loop ((first (car xs)) (second (cadr xs)) (xs (cddr xs)))
          (cond ((null? xs) second)
                ((lt? second (car xs))
                  (if (lt? first (car xs))
                      (loop (car xs) first (cdr xs))
                      (loop first (car xs) (cdr xs))))
                (else (loop first second (cdr xs))))))))

The outer loop makes n – 1 comparisons. The trick is to make the primary comparison on the second largest item found so far, so only when you know that you have a new second largest item do you have to compare to the largest item, which happens O(log n) times in the expected case. Here’s an example:

> (second-max < '(4 1 9 2 3 8 5 7 6))
8

You can run the program at http://ideone.com/2bNnhJ.

Posted by programmingpraxis

Filed in Exercises

20 Comments »

20 Responses to “Second Largest Item”

Milbrae said

June 2, 2017 at 11:10 AM

According to the QuickSelect algorithm, here’s some modified code from the RosettaCode example in D

import std.stdio, std.algorithm;

void main()
{
    uint[] a = [13, 11, 9, 7, 5, 3, 1, 0, 2, 4, 6, 8, 10, 12];
    a.topN(a.length - 2);
    a[a.length-2].writeln;
}

Output: 12

Ernie said
June 2, 2017 at 1:49 PM
Can you check the result by changing the last number in the array to 13. Do you get 13 or 11 as the result?
Milbrae said
June 2, 2017 at 2:58 PM
You’re right, Ernie. My code works only for arrays with distinct items. My bad.

Jussi Piitulainen said

June 2, 2017 at 4:08 PM

Same algorithm, different expression.

(define (second < us)
  (define (fold m n us) ; m ≤ n are least so far
    (if (pair? us)
        (call-with-values
            (lambda () (step m n (car us)))
          (lambda (m n) (fold m n (cdr us))))
        n))
  (define (step m n u) ; m ≤ n are least so far, consider u
    (if (< u n)
        (if (< u m)
            (values u m)
            (values m u))
        (values m n)))
  (if (and (pair? us) (pair? (cdr us)))
      (if (< (cadr us) (car us))
          (fold (cadr us) (car us) (cddr us))
          (fold (car us) (cadr us) (cddr us)))
      (error "list must contain at least two elements")))

(write (second < '(3 1 4 1 5 9 2 6)))
(write (second > '(3 1 4 1 5 9 2 6)))
(newline) ; writes: 16

Jussi Piitulainen said
June 2, 2017 at 4:21 PM
By the way, isn’t there something like a heap or a priority queue with a specified capacity, which simply drops values that don’t fit there any more? Then this algorithm essentially implements that data structure, with room for just two elements, in the two variables where it keeps the two best elements.
programmingpraxis said
June 2, 2017 at 4:40 PM
@Jussi: Yes, there is a size-limited priority queue; we studied in a previous exercise.

Steve said

June 2, 2017 at 8:06 PM

SECLAR    ; Second largest number
         N ARR,I
         F I=1:1:10 S ARR($R(100))=""
         W !!,"Numbers in order:"
         S I="" F  S I=$O(ARR(I)) Q:'I  W !?5,I
         W !!,"Second largest number: ",$O(ARR($O(ARR(""),-1)),-1)
         Q

MCL> D ^SECLAR

Numbers in order:
11
21
29
42
56
58
67
69
73
93

Second largest number: 73

bookofstevegraham said
June 2, 2017 at 8:13 PM
The above language is MUMPS

bookofstevegraham said

June 2, 2017 at 8:15 PM

Klong


        L::[1 9 2 7]
[1 9 2 7]
        L@((>L)@1)
7

kernelbob said

June 3, 2017 at 3:11 AM

Second largest is the second smallest if you order elements backwards. The C++ standard library can do that.

#include <algorithm>
#include <functional>
#include <array>
#include <iostream>
 
template <class RandomIterator>
RandomIterator second_largest(RandomIterator first, RandomIterator last)
{
    typedef
        typename std::iterator_traits<RandomIterator>::value_type
        value_type;
    std::nth_element(first, first + 1, last, std::greater<value_type>());
    return first + 1;
}

int main()
{
    std::array<int, 10> s = {5, 7, 4, 2, 8, 6, 1, 9, 0, 3};
 
    std::cout << *second_largest(s.begin(), s.end()) << std::endl;
    return 0;
}

Globules said

June 3, 2017 at 4:26 AM

Here’s a Haskell version. It uses radix sort, so it requires no comparisons. (Does this make me a bad person?)

import Control.Monad.ST (runST)
import Data.Vector.Algorithms.Radix (Radix, sort)
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Unboxed.Mutable as M

penultimate :: (M.Unbox a, Radix a) => U.Vector a -> Maybe a
penultimate xs | U.length xs < 2 = Nothing
               | otherwise = Just $ runST $ do v <- U.thaw xs
                                               sort v
                                               M.read v (M.length v - 2)

main :: IO ()
main = do
  -- There is no second largest element in a one element vector.
  print $ penultimate $ U.fromList [1 :: Int]
  
  let xs = U.fromList [1, 2, 2, 3, 4, 4, 5 :: Int]
  print $ penultimate xs
  print $ penultimate $ U.reverse xs

$ ./sndlrg 
Nothing
Just 4
Just 4

Milbrae said
June 3, 2017 at 7:50 AM
@kernelbob: Try this….

[sourceode lang=”cpp”]
#include
#include
#include
#include

template
RandomIterator second_largest(RandomIterator first, RandomIterator last)
{
typedef
typename std::iterator_traits::value_type
value_type;
std::nth_element(first, first + 1, last, std::greater());
return first + 1;
}

int main()
{
std::array s = {5, 9, 4, 2, 8, 6, 1, 9, 0, 3};

std::cout << *second_largest(s.begin(), s.end()) << std::endl;
return 0;
}
[/sourcecode]

As you can see I've changed one item (from 7 to 9) and the output will be 9 instead of 8. Just like my solution above this one works for arrays iff the largest item appears only once.

Can't tell about the solutions of bookofgraham and Steve, though.

Paul said

June 3, 2017 at 10:27 AM

Using a heap and heapreplace in Python, the heap never grows larger than 2 elements.

def second_largest(seq):
    heap = list(seq[:2])
    heapify(heap)
    for i in seq[2:]:
        if i > heap[0]:
            heapreplace(heap, i)
    return heap[0]

kernelbob said
June 3, 2017 at 1:44 PM
@Milbrae, thanks.

I misread the problem statement.
bookofstevegraham said
June 3, 2017 at 4:32 PM
@Milbrae:

In MUMPS, I put the numbers as the subscripts and so they can only appear once. In KLONG I believe my solution will only work if the numbers appear only once.
ardnew said
June 5, 2017 at 4:21 PM
I have a slight problem with the discussion preceding your solution example. You state sorting will be O(n log n), using a heap will be O(n log n) + O(2 log n), and that the latter is more efficient — because O(n log n) > O(2 log n), i assume.

However, this is an inaccurate comparison because sorting will require O(n log n) to build the structure, yet you get constant time for retrieving the element (or O(n) maybe). Whereas the heap has O(n log n) for building the structure and O(2 log n) for retrieval. The inequality then should really be O(n log n) + C C < O(2 log n).

So for the one-time task of finding the second largest item, the sorting approach has constant time and would always be more efficient. Do you agree?
ardnew said
June 5, 2017 at 4:27 PM
Oops something weird with formatting, end of second paragraph: O(n log n) + C < O(n log n) + O(2 log n), and simplified: C < O(2 log n)

Steve said

June 5, 2017 at 5:46 PM

This Klong solution permits multiple numbers of the same value

        l::[9 12 19 18 19]  # list of numbers assigned to l
[9 12 19 18 19]
        =l                         # get locations of each unique number
[[0] [1] [2 4] [3]]
        {*x}'=l                  # get 1st location of each unique number
[0 1 2 3]
        l2::{l@x}'{*x}'=l    # get value for each 1st location
[9 12 19 18]
        >l2                      # get locations of numbers sorted from greatest to least
[2 3 1 0]
        (>l2)@1              # get location of 2nd (largest) number
3
        l2@(>l2)@1       # get value of 2nd (largest) number
18

        l::[9 12 19 18 19]; l2::{l@x}'{*x}'=l;l2@(>l2)@1
18

programmingpraxis said
June 5, 2017 at 7:11 PM
@Ardnew: On reflection, the hash methods are O(n log k), which makes them competitive (within a constant factor) with the O(n method of this exercise. That’s only true because k = 2.
Sochima Biereagu, KodeJuice said
January 15, 2018 at 5:58 AM
We can maintain two maximum values during a loop, if current iteration value is larger than the first max number, we set the the value of the second max number to the first (only if its larger), then update the first max value, do same for second max number.
```
# O(n)
def second_largest(arr):
  max1, max2 = None, None
  for v in arr:
    if not max1: max1 = v
    elif not max2: max2 = v
    if not max2: continue

    if v > max1:
      max2 = max(max1, max2)
      max1 = v
    elif v > max2:
      max2 = v

  return max2
```

Programming Praxis