## Ullman’s Puzzle

### December 7, 2010

This puzzle is due to Jeffrey Ullman:

Given a list of

nreal numbers, a real numbert, and an integerk, determine if there exists ak-element subset of the original list ofnreal numbers that is less thant.

For instance, given the list of 25 real numbers 18.1, 55.1, 91.2, 74.6, 73.0, 85.9, 73.9, 81.4, 87.1, 49.3, 88.8, 5.7, 26.3, 7.1, 58.2, 31.7, 5.8, 76.9, 16.5, 8.1, 48.3, 6.8, 92.4, 83.0, 19.6, *t* = 98.2 and *k* = 3, the 3-element subset 31.7, 16.5 and 19.6 sums to 67.8, which is less than 98.2, so the result is *true*.

This is a puzzle, so you’re not allowed to look at the suggested solution until you have your own solution.

Your task is to write a function that makes that determination. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

My Haskell solution (see http://bonsaicode.wordpress.com/2010/12/07/programming-praxis-ullman%E2%80%99s-puzzle/ for a version with comments):

My O(n log n) solution is similar to the one given. I did keep one quick check in that’s not

strictly necessary, though: if k times the first (smallest) element is bigger than t, then we

needn’t bother adding things up. This is a remnant from when I first started working on the

problem. I was trying to find things that would reduce the amount of work needed in

O(n log n) time or less before I came upon the final answer. Since it’s a quick check for small

first values, I’ve kept it in.

My Ocaml commented solution: http://www.gleocadie.net/?p=444&lang=en

This is O(n lg(k)) instead of O(n lg(n)).

Here’s my try , not sure if it is right way to do

Also have not checked if it is correct (will do that later).

(define (solve nums t k)

(let ([p-num (/ t k)])

(let ([l1 (filter (lambda (n) (= n p-num)) nums)])

(if (empty? l1)

#f

(let ([len (length l1)])

(if (>= len k)

(take l1 k)

(let ([new-k (- k len)])

(let ([solved (solve l2 (- t (apply + l1)) new-k)])

(if solved

(append l1 solved )

#f)))))))))

oops formatting

hmm , i guess mine is in quadratic , not good :)

A slightly un-optimized and simplistic Python version:

{{{

def ullman(numbers, threshold, k):

count = 0

for x in numbers:

count += 1

if (k – 1) < count and count <= len(numbers):

b = count – k # subsequence start

e = count # subsequence end

if sum(numbers[b:e]) < threshold:

return True

return False

}}}

Formatting fail…

Keramida:

sorry but your code does not return true for this case:

You only select the k-subsets of contiguous elements of the set, which are not all k-subsets of the original set.

An improved version of my previous post. Thanks to F. Carr for the inspiration.

In F#.

Gregory LEOCADIE, you’re right. I didn’t realize that we were looking for non-contiguous subsets too.

;) just call me Greg ;)

I’m not really sure about the complexity of this, but I think it’s O(n k). Also I’m not sure about the take & drop from the standard prelude—my leading & without were O(k) but assumed n > k.

The J solution (and algorithm) is pretty simple: sort the data set, and test if the sum of the K first items sum is less than T.

NB. Three inputs:

NB. DATA – the data set of numbers (in the example, 25 real numbers)

NB. K – the grouping factor (in the example, 3)

NB. LIMIT – the sum limit (in the example, 98.2

up =: dyad def 0

NB. LIMIT K up DATA

({.x)>+/({:x){.(/:{])y

)

NB. Returns 1 for true and 0 for false

testdata =. (25?1000)%1+25?100

testdata

35.8148 1.26667 3.93814 51.8571 3.73034 24.7838 3.74324 4.6 58.1667 5.63291 12.425 12.8333 4.64789 49.125 17.8261 10.1528 14.4167 6.32 11.2436 218 26.5294 4.64211 4.35366 26.9524 4.79104

<./testdata NB. mininum

1.26667

>./testdata NB. maximum

218

(+/ % #)testdata NB. average

24.7117

98.2 3 up testdata

1

10 3 up testdata

1

5 3 up testdata

0

3{.(/:{])testdata NB. 3 smallest numbers

1.26667 3.73034 3.74324

Just getting used to Q :)

/ data

n:18.1 55.1 91.2 74.6 73.0 85.9 73.9 81.4 87.1 49.3 88.8 5.7 26.3 7.1 58.2 31.7 5.8 76.9 16.5 8.1 48.3 6.8 92.4 83.0 19.6

t:98.2

k:3

`/ func`

df:{y>+/[asc[x][til z]]}

df[n;t;k]

Another J solution.

ullman=: 3 : 0

‘t k list’ =. y

list =. /:~ list NB. ascending sort

set =. k {. list NB. first k items

U =. t > +/set NB. test

U ; U#set NB. result; empty set if false

)

ullman 12.3; 4; 145.3 3 4 4 5 655 _3 6

+-+——–+

|1|_3 3 4 4|

+-+——–+

ullman 2.3; 4; 145.3 3 4 4 5 655 _3 6

+-++

|0||

+-++

I got asked a variant of this question in a phone interview for Google. I basically used the sorting solution above, but my interviewer told me there’s a better solution. Instead, if you use a data structure like a BST or a heap, size-bounded at k elements, and replacing the largest element when you find something smaller than it, you can reduce the complexity from O(n log n) to O(n log k). In the worst case scenario, where k = n, the complexity will become O(n log n), but otherwise it will be quicker. Thanks for the coding puzzles, and keep up the good work!

the ‘better’ solution will not even get the first line coded by the time {t > +/ k take sort V} gets coded. One could use the remaining time to find an idiomatic shortcut to k take sort V, which is what Matt described with the data structure solution, like:

t > +/ k take_sort V

Note, that kind of optimization not being obvious with an algoloid language.