Partitioning The Telephone Book

July 10, 2015

We begin with the data:

(define phone (list 16 4 17 10 15 4 4 6 7 14 9 17 27 6 1 9 0 12 20 8 0 3 4 0 3 4))

Our first solution is brute force: generate all possible partitionings using the stars-and-bars algorithm, then score each one and keep the minimum. For a problem with n = 26 inputs and k = 4 partitions, there are n+k-1 choose k-1 = 26+4-1 choose 4-1 = 29 choose 3 = 3654 possible partitionings (where choose computes the binomial number):

(define (parts stars bars) (if (= bars 1) (list (list stars)) (let loop ((i 0) (ps (parts stars (- bars 1))) (zs (list))) (if (< stars i) zs (if (null? ps) (loop (+ i 1) (parts (- stars i 1) (- bars 1)) zs) (loop i (cdr ps) (cons (cons i (car ps)) zs)))))))

The score function computes the sum of the differences to the ideal partition:

(define (score xs part ideal) (let loop ((xs xs) (part part) (zs (list))) (if (pair? part) (call-with-values (lambda () (split (car part) xs)) (lambda (first rest) (loop rest (cdr part) (cons first zs)))) (let loop ((zs (map sum (reverse zs))) (tot 0)) (if (null? zs) tot (loop (cdr zs) (+ tot (abs (- (car zs) ideal)))))))))

Here we loop through the partitions, keeping track of the minimum as we go:

(define (phone-book xs k) (let ((min-part '()) (min-diff 10000000)) (do ((ps (parts (length xs) k) (cdr ps))) ((null? ps) (values min-part min-diff)) (let ((s (score xs (car ps) (/ (sum xs) k)))) (when (< s min-diff) (set! min-part (car ps)) (set! min-diff s))))))

And here’s the solution of our test problem, where (4 7 6 9) corresponds to A-D, E-K, L-Q, R-Z with a score of 18:

> (partition phone 4) (4 7 6 9) 18

The brute force solution doesn’t scale well, being essentially exponential in the number of stars. So I tried a scanning algorithm that runs through the alphabet left to right, making the “greedy” choice to add letters up to the ideal partition, splitting where the current cumulative difference is least. That doesn’t work. The first partition starts with 16 + 4 + 17 + 10 = 47, then you have to decide whether to add 15. If you don’t, the difference is 55-47 = 8, and if you do, the difference is 47+15-55 = 7, so on the greedy principle we add 15 to the first partition, and we’ve gone wrong; if you carry that algorithm to the end, with boundaries at 55, 110, and 165, the partition is (5 6 6 9) and the score is 24. So the greedy algorithm doesn’t work.

Unfortunately I don’t know a better algorithm. My instinct is that there exists a dynamic programming algorithm with time complexity O(kn), but in several experiments I haven’t found it. Perhaps one of you readers can help.

We used split from the Standard Prelude. You can run the program at http://ideone.com/kFK44h.

Posted by programmingpraxis

Filed in Exercises

13 Comments »

13 Responses to “Partitioning The Telephone Book”

Paul said

July 10, 2015 at 10:33 AM

In Python. As the count for Q is zero, the answer is slightly ambiguous. [(‘A’, ‘D’), (‘E’, ‘K’), (‘L’, ‘Q’), (‘R’, ‘Z’)] or [(‘A’, ‘D’), (‘E’, ‘K’), (‘L’, ‘P’), (‘R’, ‘Z’)] are essentially the same answer.

from string import ascii_uppercase
import itertools as IT

CHARS = ascii_uppercase
COUNTS = list(map(int, "16 4 17 10 15 4 4 6 7 14 9 17 27 6 1 9 0 12 20 8 0 3 4 0 3 4".split()))
CUM = list(IT.accumulate(COUNTS))
TARGET = 55

bestcount, bestsplit = float('inf'), None

for p1, p2, p3 in IT.combinations(range(26), 3):
    counts = (CUM[p1], CUM[p2] - CUM[p1], CUM[p3] - CUM[p2], CUM[-1] - CUM[p3])
    total = sum(abs(x - TARGET) for x in counts)
    if total < bestcount:
        bestcount = total
        bestsplit = (p1, p2, p3)

bestsplit = (-1,) + bestsplit + (25,)
split = [(CHARS[i1+1], CHARS[i2])for i1, i2 in zip(bestsplit, bestsplit[1:])]
    
print(bestcount, split) #18 [('A', 'D'), ('E', 'K'), ('L', 'P'), ('Q', 'Z')]

James Curtis-Smith said

July 10, 2015 at 12:05 PM

BFI approach in perl… note the best way to get the size of the partitions is to generate the CDF and diffing this…

my @f= qw(16 4 17 10 15 4 4 6 7 14 9 17 27 6 1 9 0 12 20 8 0 3 4 0 3 4);
my @c = ($f[0]);

push @c, $c[-1] + $f[$_] foreach 1..25;

my $av = $c[-1]/4;

my $best = [1e6,0,0,0];
foreach my $a (0..25) {
  foreach my $b ($b..25) {
    foreach my $c (0..25) {
      $t = abs( $c[$a] - $av )
         + abs( $c[$b] - $c[$a] - $av )
         + abs( $c[$c] - $c[$b] - $av )
         + abs( $c[-1] - $c[$c] - $av );
      $best = [$t,$a,$b,$c] if $best->[0] > $t;
    }
  }
}

printf "[%f] A-%s, %s-%s, %s-%s, %s-Z\n", $best->[0], xx($best->[1]), xx($best->[2]), xx($best->[3]);

sub xx {
  return (map { chr( $_[0] + $_ + ord 'A' ) } 0..1);
}

FA said
July 14, 2015 at 10:37 AM
It was hard for me and i am still not satisfied with my solution.
In Scala Worksheet:

package programmingpraxis

// https://programmingpraxis.com/2015/07/10/partitioning-the-telephone-book/
object TelefonPartitioning {
val letters = List(“A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”, “L”, “M”, “N”, “0”, “P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”)
val num = List(16, 4, 17, 10, 15, 4, 4, 6, 7, 14, 9, 17, 27, 6, 1, 9, 0, 12, 20, 8, 0, 3, 4, 0, 3, 4)
val volumes = 4
val letterNumbers = letters zip num

def partitionsEndCounters(n: Int, parts: Int): Seq[Seq[Int]] = {
if (parts == 1) {
List(List(n))
} else {
for (
endIndex <- 1 to n – parts +1;
rest x + endIndex)
}
}

val total = num.sum
val optimal = total / volumes
val partitionsEndCts = partitionsEndCounters(num.size, volumes)
val partitionEndIndices = partitionsEndCts.map(x=> x.map { x => x-1 })
val partitions=partitionEndIndices.map { x => for(i x.map { x => num.slice(x._1, x._2+1).sum } }
val differencesSums = partitionSums.map { x => x.map { x => (optimal-x).abs }.sum }
val min = differencesSums.min
val beste = (differencesSums zip partitions) filter(p=>p._1==min)
val bestLetters = beste.unzip._2.map { x => x.map(x=>letters(x._1)+”-“+letters(x._2))}
//> bestLetters : Seq[scala.collection.immutable.IndexedSeq[String]] = Vector(
//| Vector(A-D, E-K, L-P, Q-Z), Vector(A-D, E-K, L-Q, R-Z))
}

Mike said

July 15, 2015 at 8:00 PM

Python. Basically using an A* search.

import itertools as it
from heapq import heappush, heappop

def partition(distribution, nparts):
    items, counts = zip(*distribution)

    cdf = list(it.accumulate(it.chain([0], counts)))
    total = cdf[-1]
    rcdf = [total - x for x in cdf]
    goal  = len(items)
    target = total / nparts
    
    def neighbors(path):
        if len(path) == nparts:
            yield path + (len(items),)
            
        elif len(path) < nparts:
            yield from (path + (i,) for i in range(path[-1] + 1, len(items) - (nparts - len(path))))

    def cost(path):
        cost_so_far = 0
        for b,e in zip(path, path[1:]):
            cost_so_far += abs(target - (cdf[e] - cdf[b]))
        
        nparts_left = nparts + 1 - len(path)
        if nparts_left > 0:
            est_to_goal = abs(target*nparts_left - rcdf[path[-1]]) // nparts_left
        else:
            est_to_goal = 0
        
        return cost_so_far + est_to_goal

    # basically an A* search
    queue = [(0, (0,))]
    while queue:
        score, path = heappop(queue)

        if path[-1] == goal:
            partitions = [items[b:e] for b,e in zip(path, path[1:])]
            return score, partitions
         
        for neighbor in neighbors(path):
            heappush(queue, (cost(neighbor), neighbor))
 
    return None, ()     # no score and empty path indicates failure

# test
from string import ascii_uppercase
dist = [16,  4, 17, 10, 15,  4,  4,  6,  7, 14,  9, 17, 27,  6,  1,  9,  0, 12, 20,  8,  0,  3,  4,  0,  3,  4]

partition(zip(ascii_uppercase, dist), 4) # returns => (18.0, [('A', 'B', 'C', 'D'),
                                                             #                             ('E', 'F', 'G', 'H', 'I', 'J', 'K'),
                                                             #                             ('L', 'M', 'N', 'O', 'P'),
                                                             #                             ('Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z')])

Leez Shadowbringer said
July 19, 2015 at 5:02 PM
def get_pos_of(iterable, ref):
j = 0
for i in iterable:
if ref == i:
return j
j += 1
return j

def gen_dif(combination,letters,values):
temp = []
totaldifference=0
for c in combination:
a,b = c.split(‘-‘)[0],c.split(‘-‘)[1]
tj = 0
for i in range(get_pos_of(letters,a),get_pos_of(letters,b)):
tj+=values[i]
temp.append(tj)
totaldifference = abs(temp[0]-temp[1])+abs(temp[1]-temp[2])+abs(temp[2]-temp[3])
return totaldifference
def gen_combination(lisa):
telebooks=1
tempnumber=0
combination=[]
while telebooks <= 4:
random.seed()
if telebooks == 4:
try:
combination.append('%s-%s'%(lisa[tempnumber],'Z'))
except:
pass;
#print(combination)
else:
try:
r=random.randint(tempnumber,(len(lisa)-1)-(4-telebooks)-1)
except:
pass;
#print((len(lisa)-1)-(4-telebooks)-1)
if r==tempnumber:
r+=1
combination.append('%s-%s'%(lisa[tempnumber],lisa[r]))
tempnumber = r+1
telebooks+=1
return combination
def index_phonebook():
lisa = list("abcdefghijklmnopqrstuvwxyz".upper())
lisb = [16,4,17,10,15,4,4,6,7,14,9,17,27,6,1,9,0,12,20,8,0,3,4,0,3,4]
used_combinations = [['A-E','F-K','L-O','P-Z'],[]]
best_dif = 26
best_combination = ['A-D','E-J','K-O','P-Z']
for i in range(0,1000):
print(i)
while True:
temp_combination=gen_combination(lisa)
if temp_combination not in used_combinations:
used_combinations.append(temp_combination)
temp_dif = gen_dif(temp_combination,lisa,lisb)
if temp_dif < best_dif:
best_dif = temp_dif
best_combination=temp_combination
break;
print(best_dif)
print(best_combination)
Soonseop said
July 22, 2015 at 6:58 PM
> (parts 4 3)
((4 0 0) (3 0 1) (3 1 0) (2 0 2) (2 1 1) (2 2 0) (1 0 3) (1 1 2) (1 2 1) (1 3 0) (0 0 4) (0 1 3) (0 2 2) (0 3 1) (0 4 0))

we don’t need most of part in the above result because any phone book volume must have more than 1 phone numbers
(4 0 0)
(3 0 1)
(3 1 0)
(2 0 2)
(2 2 0)
(1 0 3)
(1 3 0)
(0 0 4)
(0 1 3)
(0 2 2)
(0 3 1)
(0 4 0)

So… below is the more efficient version of “parts” function

> (parts+ 4 3)
((2 1 1) (1 1 2) (1 2 1))

(define (parts+ stars bars)
(if (= bars 1)
(list (list stars))
(let loop ((i 1)
(ps (parts+ (- stars 1) (- bars 1)))
(zs (list)))
(if (> i (+ (- stars bars) 1))
zs
(if (null? ps)
(loop (1+ i) (parts+ (- stars i 1) (1- bars)) zs)
(loop i (cdr ps) (cons (cons i (car ps)) zs)))))))

> (length (parts 30 8))
10,295,472

> (length (parts+ 30 8))
1,560,780
Soonseop said
July 22, 2015 at 8:02 PM
(define (parts+ stars bars)
(if (= bars 1)
(list (list stars))
(let loop ((i 1)
(ps (parts*+ (- stars 1) (- bars 1)))
(zs (list)))
(if (> i (+ (- stars bars) 1))
zs
(if (null? ps)
(loop (1+ i) (parts+ (- stars i 1) (1- bars)) zs)
(loop i (cdr ps) (cons (cons i (car ps)) zs)))))))
Soonseop said
July 22, 2015 at 8:28 PM
(define (parts+ stars bars)
(if (= bars 1)
(list (list stars))
(let loop ((i 1)
(ps (parts+ (- stars 1) (- bars 1)))
(zs (list)))
(if (> i (+ (- stars bars) 1))
zs
(if (null? ps)
(loop (1+ i) (parts+ (- stars i 1) (1- bars)) zs)
(loop i (cdr ps) (cons (cons i (car ps)) zs)))))))
[\sourcecode]

Soonseop said

July 22, 2015 at 8:31 PM

(define (parts+ stars bars)
  (if (= bars 1)
      (list (list stars))
      (let loop ((i 1)
                 (ps (parts+ (- stars 1) (- bars 1)))
                 (zs (list)))
        (if (> i (+ (- stars bars) 1))
            zs
            (if (null? ps)
                (loop (1+ i) (parts+ (- stars i 1) (1- bars)) zs)
                (loop i (cdr ps) (cons (cons i (car ps)) zs)))))))

Dennis Decker Jensen said

August 17, 2015 at 8:46 PM

I tried CLP in gprolog 1.4.4, but couldn’t find a heuristic to find the optimal score of 18.

% A phone book with this many names and phones per letter
phone_weights([17, 5,18,11,16, 5, 5, 7, 8,15,10,18,28,
                7, 2,10, 1,13,21, 9, 1, 4, 5, 1, 4, 5]).

% Scan sum of number list (like +\ in J)
cumulative(Ns0,Ns1) :-
        cumulative(Ns0,0,Ns1).

cumulative([],_,[]).
cumulative([X|Xs],Acc,[Y|Ys]) :-
        Y is Acc + X,
        cumulative(Xs,Y,Ys).

% Find optimal partition by branch-and-bound algorithm with restart.
% This finds multiple solutions, if there are any.
partition(Phone,A,B,C,Score) :-
        cumulative(Phone,Phone_cum),
        nth(26,Phone_cum,Last),
        Target is Last div 4,
        fd_domain([A,B,C],1,26),
        B #> A, C #> B,
        fd_minimize(score(Phone_cum,Last,Target,A,B,C,Score), Score).

% Minimization goal (with different heuristics)
score(Phone_cum,Last,Target,A,B,C,Score) :-
        %fd_labelingff([A,B,C]),
        %fd_labeling([A,B,C],[value_method(middle)]),
        %fd_labeling([A,B,C],[value_method(max)]),
        %fd_labeling([A,B,C],[value_method(bisect)]),
        %fd_labeling([A,B,C],[value_method(min)]),
        %fd_labeling([A,B,C],[value_method(bounds)]),
        %fd_labeling([A,B,C],[value_method(random)]),
        %fd_labeling([A,B,C],[variable_method(most_constrained)]),
        %fd_labeling([A,B,C],[variable_method(smallest)]),
        %fd_labeling([A,B,C],[variable_method(largest)]),
        %fd_labeling([A,B,C],[variable_method(max_regret)]),
        %fd_labeling([A,B,C],[variable_method(random)]),
        %fd_labeling([A,B,C],[reorder(false)]),
        fd_labeling([A,B,C]),
        nth(A,Phone_cum,X),
        nth(B,Phone_cum,Y),
        nth(C,Phone_cum,Z),
        %format(`X=%d Y=%d Z=%d~n`,[X,Y,Z]),
        Score is abs(X-Target)
                + abs(Y-X-Target)
                + abs(Z-Y-Target)
                + abs(Last-Z-Target).
        %format(`Score=%d~n`,[Score]).

show(A,B,C,Score) :-
        format(`Score = %d: A-%c, %c-%c, %c-%c, %c-Z~n`,
                [Score,64+A-1,64+A,64+B-1,64+B,64+C-1,64+C]).

test(A,B,C,Score) :-
        phone_weights(P),
        partition(P,A,B,C,Score).

go :-
        phone_weights(P),
        partition(P,A,B,C,Score),
        show(A,B,C,Score),
        % Continue until all minimization solutions are found:
        fail; true.

:- initialization(go).

% 8041 bytes written, 6 ms
% Score = 22: A-C, D-J, K-P, Q-Z
% Score = 22: A-C, D-J, K-O, P-Z
% | ?- 

% Branch-and-bound didn't find the minimal solution with score = 18
% That surprises me. I would have expected the value_method(middle)
% to find it, since it works from the middle towards the bounds.
% None of the heuristics differed in any way from the default labelling.

Dennis Decker Jensen said

August 18, 2015 at 2:25 AM

I modified the prolog solution to only use constraints. No difference, except the program text is shorter and clearer: The score predicate vanishes, and the partition predicate is modified to be more direct.

partition(Phone,A,B,C,Score) :-
        cumulative(Phone,Phone_cum),
        nth(26,Phone_cum,Last),
        Target is Last div 4,
        fd_set_vector_max(Last), % To enable enough solutions
        fd_domain([A,B,C],1,26),
        fd_all_different([A,B,C]),
        fd_element(A,Phone_cum,X),
        fd_element(B,Phone_cum,Y),
        fd_element(C,Phone_cum,Z),
        Score #= dist(X,Target)
                + dist(dist(Y,X),Target)
                + dist(dist(Z,Y),Target)
                + dist(dist(Last,Z),Target),
        fd_minimize(fd_labeling([A,B,C,X,Y,Z,Score]), Score).

Dennis Decker Jensen said
August 18, 2015 at 2:38 AM
If I fix Target on 63 instead of the calculated Last div 4 =:= 61 then branch-and-bound finds the 2 optimal solutions with score = 18: A-C, D-J, K-O, P-Z; and A-C, D-J, K-P, Q-Z.
Dennis Decker Jensen said
August 18, 2015 at 2:50 AM
Of course, the truncation of div: Last is 246, and 246 / 4 =:= 61.5; 4 * 0.5 =:= 2 + 61 =:= 63.

Programming Praxis