K’th Largest Item

August 1, 2014

Today’s exercise is a common interview question, and also interesting in its own right:

Find the kth largest integer in a file of n integers, where n is large (defined as too large to fit into memory all at once) and k is small (defined as small enough to all fit in memory).

Your task is to write the program described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

8 Comments »

8 Responses to “K’th Largest Item”

James Curtis-Smith said

August 1, 2014 at 9:17 AM

#!/usr/local/bin/perl

use strict;
use warnings;

my $k = shift @ARGV;

my @nos = ();
foreach my $i (1..$k) {
  my $l = <>;
  $l*=1;
  push @nos, $l;
}
@nos = sort { $b<=>$a } @nos;

while(<>) {
  $_*=1;
  next if $_ < $nos[$k-1];
  foreach my $i (0..($k-1)) {
    next unless $_ > $nos[$i];
    splice @nos,$i,0,$_;
    pop @nos;
    last;
  }
}

print "$nos[$k-1]\n";

Josef said
August 1, 2014 at 12:41 PM
Too lazy to do the file processing. The program below finds the kth largest in a list. In Haskell.

code{white-space: pre;}

table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; }
code > span.dt { color: #902000; }
code > span.dv { color: #40a070; }
code > span.bn { color: #40a070; }
code > span.fl { color: #40a070; }
code > span.ch { color: #4070a0; }
code > span.st { color: #4070a0; }
code > span.co { color: #60a0b0; font-style: italic; }
code > span.ot { color: #007020; }
code > span.al { color: #ff0000; font-weight: bold; }
code > span.fu { color: #06287e; }
code > span.er { color: #ff0000; font-weight: bold; }
```
module Kth where

import Data.List

findKth :: Ord a => Int -> [a] -> Maybe a
findKth k [] = Nothing
findKth k as = Just (loop (sort ls) rs)
  where (ls,rs) = splitAt k as

loop (k:_) [] = k
loop (k:ls) (r:rs)
  | r > k     = loop (insert r ls) rs
  | otherwise = loop (k:ls) rs
```
svenningsson said
August 1, 2014 at 12:46 PM
Sorry about the weird looking comment. I experimented with getting syntax highlighting for Haskell. It seems there is no way for me to edit the comment to fix the problem.
Juan Ignacio Saba said
August 1, 2014 at 2:10 PM
James Curtis-Smith’s answer is the best approach I can think of, which means to always keep a list of the top k elements and adjust that list when necessary. It requires only one pass on the original list and worst case scenario happens when the original list is sorted so that everytime you read a new element you need to “splice”. In that case it’s O(k*n). There’s another approach that’s also O(k*n), which is removing from the original list the biggest element and doing this k times, but since the original list is so big you probably can’t “remove” things from it. At best you can store them in a hash to discard them later, but it seems more complicated and (worst of all) it’s o(k*n) always, not just in the worst case scenario.

Anyway, just wanted to share my analysis. No need to share an actual solution, since it’s all there already.

Paul said

August 1, 2014 at 2:47 PM

In Python.

from heapq import heappush, heapreplace

def k_largest(seq, k):
    """number of items in seq is much larger than k
       return the k-th largest
    """
    it = iter(seq)
    heap = [] # keeps the argest k items
    for i in range(k):
        heappush(heap, next(it))
    for item in it:
        if item > heap[0]:
            heapreplace(heap, item)
    return heap[0]

Mike said
August 1, 2014 at 6:44 PM
The python library ‘heapq’ provides a function nlargest().
```
import heapq

def kthlargest(k, iterable):
    return heapq.nlargest(k, iterable)[0]
```
It basically uses Paul’s algorithm. I think using a heap reduces the complexity from O(n*k) to O(n*log k).
MS said
August 3, 2014 at 12:16 AM
Haskell, so compile with ghc, then call with k and filename as command-line arguments. Integers should be one per line in the file.

import System.Environment
import Data.List

main = do
(k:file:_) <- getArgs
ints [Integer] -> [Integer]
klargest k = foldl (\a -> take k . sortBy (flip compare) . (:a)) []
MS said
August 3, 2014 at 12:18 AM
Some of that got eaten. Try again.

import System.Environment
import Data.List

main = do
(k:file:_) <- getArgs
ints [Integer] -> [Integer]
klrg k = foldl (\a -> take k . sortBy (flip compare) . (:a)) []

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Programming Praxis

K’th Largest Item

August 1, 2014

8 Responses to “K’th Largest Item”

Leave a comment

Categories

Archives

Archives

Programming Praxis

K’th Largest Item

August 1, 2014

Share this:

Related

8 Responses to “K’th Largest Item”

Leave a comment

Categories

Archives

Archives