K’th Largest Item

August 1, 2014

We give two solutions, taking our input from a random number generator instead of reading from a file:

(define rand
  (let ((a 69069) (c 1234567) (m (expt 2 32))
        (seed 20140801))
    (lambda ()
      (set! seed (modulo (+ (* a seed) c) m))
      (display seed) (newline) ; make visible for debugging
      seed)))

Our first solution indexes through the n items, keeping a sorted list of the k largest items seen so far. We initialize the list by insertion-sorting the first k items in the input:

(define (insert-in-order x xs)
  (let loop ((xs xs) (zs (list)))
    (if (null? xs) (reverse (cons x zs))
      (if (< x (car xs))
          (append (reverse zs) (list x) xs)
          (loop (cdr xs) (cons (car xs) zs))))))

After the first k items are in the list, in order, the remaining nk items are inserted into the cdr of the list if they are greater than the current car of the list:

(define (kth-largest-list n k)
  (let loop ((xs (list)) (i k))
    (if (positive? i)
        (loop (insert-in-order (rand) xs) (- i 1))
        (let loop ((xs xs) (n (- n k)))
          (if (zero? n) (car xs)
            (let ((x (rand)))
              (if (< x (car xs)) (loop xs (- n 1))
                (loop (insert-in-order x (cdr xs)) (- n 1)))))))))

That works in time O(n k). A better solution uses a heap (priority queue) instead of a list to hold the k largest integers seen so far, so it works in time O(n log k):

(define (kth-largest-heap n k)
  (let loop ((pq pq-empty) (i k))
    (if (positive? i)
        (loop (pq-insert < (rand) pq) (- i 1))
        (let loop ((pq pq) (n (- n k)))
          (if (zero? n) (pq-first pq)
            (let ((x (rand)))
              (if (< x (pq-first pq)) (loop pq (- n 1))
                (loop (pq-insert < x (pq-rest < pq)) (- n 1)))))))))

We used pairing heaps, but any of the heap data structures that we have used in prior exercises will work. You can run the program at http://programmingpraxis.codepad.org/qmGTnhrL.

Advertisement

Pages: 1 2

8 Responses to “K’th Largest Item”

  1. #!/usr/local/bin/perl
    
    use strict;
    use warnings;
    
    my $k = shift @ARGV;
    
    my @nos = ();
    foreach my $i (1..$k) {
      my $l = <>;
      $l*=1;
      push @nos, $l;
    }
    @nos = sort { $b<=>$a } @nos;
    
    while(<>) {
      $_*=1;
      next if $_ < $nos[$k-1];
      foreach my $i (0..($k-1)) {
        next unless $_ > $nos[$i];
        splice @nos,$i,0,$_;
        pop @nos;
        last;
      }
    }
    
    print "$nos[$k-1]\n";
    
  2. Josef said

    Too lazy to do the file processing. The program below finds the kth largest in a list. In Haskell.

    code{white-space: pre;}

    table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
    margin: 0; padding: 0; vertical-align: baseline; border: none; }
    table.sourceCode { width: 100%; line-height: 100%; }
    td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
    td.sourceCode { padding-left: 5px; }
    code > span.kw { color: #007020; font-weight: bold; }
    code > span.dt { color: #902000; }
    code > span.dv { color: #40a070; }
    code > span.bn { color: #40a070; }
    code > span.fl { color: #40a070; }
    code > span.ch { color: #4070a0; }
    code > span.st { color: #4070a0; }
    code > span.co { color: #60a0b0; font-style: italic; }
    code > span.ot { color: #007020; }
    code > span.al { color: #ff0000; font-weight: bold; }
    code > span.fu { color: #06287e; }
    code > span.er { color: #ff0000; font-weight: bold; }

    module Kth where
    
    import Data.List
    
    findKth :: Ord a => Int -> [a] -> Maybe a
    findKth k [] = Nothing
    findKth k as = Just (loop (sort ls) rs)
      where (ls,rs) = splitAt k as
    
    loop (k:_) [] = k
    loop (k:ls) (r:rs)
      | r > k     = loop (insert r ls) rs
      | otherwise = loop (k:ls) rs
  3. svenningsson said

    Sorry about the weird looking comment. I experimented with getting syntax highlighting for Haskell. It seems there is no way for me to edit the comment to fix the problem.

  4. James Curtis-Smith’s answer is the best approach I can think of, which means to always keep a list of the top k elements and adjust that list when necessary. It requires only one pass on the original list and worst case scenario happens when the original list is sorted so that everytime you read a new element you need to “splice”. In that case it’s O(k*n). There’s another approach that’s also O(k*n), which is removing from the original list the biggest element and doing this k times, but since the original list is so big you probably can’t “remove” things from it. At best you can store them in a hash to discard them later, but it seems more complicated and (worst of all) it’s o(k*n) always, not just in the worst case scenario.

    Anyway, just wanted to share my analysis. No need to share an actual solution, since it’s all there already.

  5. Paul said

    In Python.

    from heapq import heappush, heapreplace
    
    def k_largest(seq, k):
        """number of items in seq is much larger than k
           return the k-th largest
        """
        it = iter(seq)
        heap = [] # keeps the argest k items
        for i in range(k):
            heappush(heap, next(it))
        for item in it:
            if item > heap[0]:
                heapreplace(heap, item)
        return heap[0]
    
  6. Mike said

    The python library ‘heapq’ provides a function nlargest().

    import heapq
    
    def kthlargest(k, iterable):
        return heapq.nlargest(k, iterable)[0]
    

    It basically uses Paul’s algorithm. I think using a heap reduces the complexity from O(n*k) to O(n*log k).

  7. MS said

    Haskell, so compile with ghc, then call with k and filename as command-line arguments. Integers should be one per line in the file.

    import System.Environment
    import Data.List

    main = do
    (k:file:_) <- getArgs
    ints [Integer] -> [Integer]
    klargest k = foldl (\a -> take k . sortBy (flip compare) . (:a)) []

  8. MS said

    Some of that got eaten. Try again.

    import System.Environment
    import Data.List

    main = do
    (k:file:_) <- getArgs
    ints [Integer] -> [Integer]
    klrg k = foldl (\a -> take k . sortBy (flip compare) . (:a)) []

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: