## Partitioning The Telephone Book

### July 10, 2015

The city where I live used to publish a telephone directory; the “white pages” were distributed to all telephone customers once a year. Now my city no longer prints the directory; it is available on the internet, or you can pay an operator to look up a telephone number for you.

But there are still cities that print telephone directories, and some of those cities are big enough that the directory must be partitioned into multiple volumes. Consider this distribution of the first letters of customer’s last names:

```A  B C  D  E  F G H I J  K L  M  N 0 P Q R  S  T U V W X Y Z
16 4 17 10 15 4 4 6 7 14 9 17 27 6 1 9 0 12 20 8 0 3 4 0 3 4
```

I’m not sure what the units are (probably tens of thousands of telephone customers), but the total is 220, and the telephone company has decided to print 4 volumes, so each should be about 55 units. One possible partitioning is A-D, E-J, K-O, P-Z, with counts of 47, 50, 60 and 63 units, differences of 8, 5, 5 and 8 from the ideal of 55, and a total difference of 26. Another partitioning is A-E, F-K, L-O, P-Z, with counts of 62, 44, 51, and 63 units, differences of 7, 11, 4 and 8, and a total difference of 30, which is worse. Before continuing, you might want to work out the optimal solution by hand, finding a minimum score of 18.

Your task is to write a program that determines the partitioning of the telephone book that minimizes the total difference. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

### 13 Responses to “Partitioning The Telephone Book”

1. Paul said

In Python. As the count for Q is zero, the answer is slightly ambiguous. [(‘A’, ‘D’), (‘E’, ‘K’), (‘L’, ‘Q’), (‘R’, ‘Z’)] or [(‘A’, ‘D’), (‘E’, ‘K’), (‘L’, ‘P’), (‘R’, ‘Z’)] are essentially the same answer.

```from string import ascii_uppercase
import itertools as IT

CHARS = ascii_uppercase
COUNTS = list(map(int, "16 4 17 10 15 4 4 6 7 14 9 17 27 6 1 9 0 12 20 8 0 3 4 0 3 4".split()))
CUM = list(IT.accumulate(COUNTS))
TARGET = 55

bestcount, bestsplit = float('inf'), None

for p1, p2, p3 in IT.combinations(range(26), 3):
counts = (CUM[p1], CUM[p2] - CUM[p1], CUM[p3] - CUM[p2], CUM[-1] - CUM[p3])
total = sum(abs(x - TARGET) for x in counts)
if total < bestcount:
bestcount = total
bestsplit = (p1, p2, p3)

bestsplit = (-1,) + bestsplit + (25,)
split = [(CHARS[i1+1], CHARS[i2])for i1, i2 in zip(bestsplit, bestsplit[1:])]

print(bestcount, split) #18 [('A', 'D'), ('E', 'K'), ('L', 'P'), ('Q', 'Z')]
```
2. BFI approach in perl… note the best way to get the size of the partitions is to generate the CDF and diffing this…

```my @f= qw(16 4 17 10 15 4 4 6 7 14 9 17 27 6 1 9 0 12 20 8 0 3 4 0 3 4);
my @c = (\$f);

push @c, \$c[-1] + \$f[\$_] foreach 1..25;

my \$av = \$c[-1]/4;

my \$best = [1e6,0,0,0];
foreach my \$a (0..25) {
foreach my \$b (\$b..25) {
foreach my \$c (0..25) {
\$t = abs( \$c[\$a] - \$av )
+ abs( \$c[\$b] - \$c[\$a] - \$av )
+ abs( \$c[\$c] - \$c[\$b] - \$av )
+ abs( \$c[-1] - \$c[\$c] - \$av );
\$best = [\$t,\$a,\$b,\$c] if \$best-> > \$t;
}
}
}

printf "[%f] A-%s, %s-%s, %s-%s, %s-Z\n", \$best->, xx(\$best->), xx(\$best->), xx(\$best->);

sub xx {
return (map { chr( \$_ + \$_ + ord 'A' ) } 0..1);
}
```
3. FA said

It was hard for me and i am still not satisfied with my solution.
In Scala Worksheet:

package programmingpraxis

// https://programmingpraxis.com/2015/07/10/partitioning-the-telephone-book/
object TelefonPartitioning {
val letters = List(“A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”, “L”, “M”, “N”, “0”, “P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”)
val num = List(16, 4, 17, 10, 15, 4, 4, 6, 7, 14, 9, 17, 27, 6, 1, 9, 0, 12, 20, 8, 0, 3, 4, 0, 3, 4)
val volumes = 4
val letterNumbers = letters zip num

def partitionsEndCounters(n: Int, parts: Int): Seq[Seq[Int]] = {
if (parts == 1) {
List(List(n))
} else {
for (
endIndex <- 1 to n – parts +1;
rest x + endIndex)
}
}

val total = num.sum
val optimal = total / volumes
val partitionsEndCts = partitionsEndCounters(num.size, volumes)
val partitionEndIndices = partitionsEndCts.map(x=> x.map { x => x-1 })
val partitions=partitionEndIndices.map { x => for(i x.map { x => num.slice(x._1, x._2+1).sum } }
val differencesSums = partitionSums.map { x => x.map { x => (optimal-x).abs }.sum }
val min = differencesSums.min
val beste = (differencesSums zip partitions) filter(p=>p._1==min)
val bestLetters = beste.unzip._2.map { x => x.map(x=>letters(x._1)+”-“+letters(x._2))}
//> bestLetters : Seq[scala.collection.immutable.IndexedSeq[String]] = Vector(
//| Vector(A-D, E-K, L-P, Q-Z), Vector(A-D, E-K, L-Q, R-Z))
}

4. Mike said

Python. Basically using an A* search.

```import itertools as it
from heapq import heappush, heappop

def partition(distribution, nparts):
items, counts = zip(*distribution)

cdf = list(it.accumulate(it.chain(, counts)))
total = cdf[-1]
rcdf = [total - x for x in cdf]
goal  = len(items)
target = total / nparts

def neighbors(path):
if len(path) == nparts:
yield path + (len(items),)

elif len(path) < nparts:
yield from (path + (i,) for i in range(path[-1] + 1, len(items) - (nparts - len(path))))

def cost(path):
cost_so_far = 0
for b,e in zip(path, path[1:]):
cost_so_far += abs(target - (cdf[e] - cdf[b]))

nparts_left = nparts + 1 - len(path)
if nparts_left > 0:
est_to_goal = abs(target*nparts_left - rcdf[path[-1]]) // nparts_left
else:
est_to_goal = 0

return cost_so_far + est_to_goal

# basically an A* search
queue = [(0, (0,))]
while queue:
score, path = heappop(queue)

if path[-1] == goal:
partitions = [items[b:e] for b,e in zip(path, path[1:])]
return score, partitions

for neighbor in neighbors(path):
heappush(queue, (cost(neighbor), neighbor))

return None, ()     # no score and empty path indicates failure

# test
from string import ascii_uppercase
dist = [16,  4, 17, 10, 15,  4,  4,  6,  7, 14,  9, 17, 27,  6,  1,  9,  0, 12, 20,  8,  0,  3,  4,  0,  3,  4]

partition(zip(ascii_uppercase, dist), 4) # returns => (18.0, [('A', 'B', 'C', 'D'),
#                             ('E', 'F', 'G', 'H', 'I', 'J', 'K'),
#                             ('L', 'M', 'N', 'O', 'P'),
#                             ('Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z')])
```
5. Leez Shadowbringer said

def get_pos_of(iterable, ref):
j = 0
for i in iterable:
if ref == i:
return j
j += 1
return j

def gen_dif(combination,letters,values):
temp = []
totaldifference=0
for c in combination:
a,b = c.split(‘-‘),c.split(‘-‘)
tj = 0
for i in range(get_pos_of(letters,a),get_pos_of(letters,b)):
tj+=values[i]
temp.append(tj)
totaldifference = abs(temp-temp)+abs(temp-temp)+abs(temp-temp)
def gen_combination(lisa):
telebooks=1
tempnumber=0
combination=[]
while telebooks <= 4:
random.seed()
if telebooks == 4:
try:
combination.append('%s-%s'%(lisa[tempnumber],'Z'))
except:
pass;
#print(combination)
else:
try:
r=random.randint(tempnumber,(len(lisa)-1)-(4-telebooks)-1)
except:
pass;
#print((len(lisa)-1)-(4-telebooks)-1)
if r==tempnumber:
r+=1
combination.append('%s-%s'%(lisa[tempnumber],lisa[r]))
tempnumber = r+1
telebooks+=1
return combination
def index_phonebook():
lisa = list("abcdefghijklmnopqrstuvwxyz".upper())
lisb = [16,4,17,10,15,4,4,6,7,14,9,17,27,6,1,9,0,12,20,8,0,3,4,0,3,4]
used_combinations = [['A-E','F-K','L-O','P-Z'],[]]
best_dif = 26
best_combination = ['A-D','E-J','K-O','P-Z']
for i in range(0,1000):
print(i)
while True:
temp_combination=gen_combination(lisa)
if temp_combination not in used_combinations:
used_combinations.append(temp_combination)
temp_dif = gen_dif(temp_combination,lisa,lisb)
if temp_dif < best_dif:
best_dif = temp_dif
best_combination=temp_combination
break;
print(best_dif)
print(best_combination)

6. Soonseop said

> (parts 4 3)
((4 0 0) (3 0 1) (3 1 0) (2 0 2) (2 1 1) (2 2 0) (1 0 3) (1 1 2) (1 2 1) (1 3 0) (0 0 4) (0 1 3) (0 2 2) (0 3 1) (0 4 0))

we don’t need most of part in the above result because any phone book volume must have more than 1 phone numbers
(4 0 0)
(3 0 1)
(3 1 0)
(2 0 2)
(2 2 0)
(1 0 3)
(1 3 0)
(0 0 4)
(0 1 3)
(0 2 2)
(0 3 1)
(0 4 0)

So… below is the more efficient version of “parts” function

> (parts+ 4 3)
((2 1 1) (1 1 2) (1 2 1))

(define (parts+ stars bars)
(if (= bars 1)
(list (list stars))
(let loop ((i 1)
(ps (parts+ (- stars 1) (- bars 1)))
(zs (list)))
(if (> i (+ (- stars bars) 1))
zs
(if (null? ps)
(loop (1+ i) (parts+ (- stars i 1) (1- bars)) zs)
(loop i (cdr ps) (cons (cons i (car ps)) zs)))))))

> (length (parts 30 8))
10,295,472

> (length (parts+ 30 8))
1,560,780

7. Soonseop said

(define (parts+ stars bars)
(if (= bars 1)
(list (list stars))
(let loop ((i 1)
(ps (parts*+ (- stars 1) (- bars 1)))
(zs (list)))
(if (> i (+ (- stars bars) 1))
zs
(if (null? ps)
(loop (1+ i) (parts+ (- stars i 1) (1- bars)) zs)
(loop i (cdr ps) (cons (cons i (car ps)) zs)))))))

8. Soonseop said

(define (parts+ stars bars)
(if (= bars 1)
(list (list stars))
(let loop ((i 1)
(ps (parts+ (- stars 1) (- bars 1)))
(zs (list)))
(if (> i (+ (- stars bars) 1))
zs
(if (null? ps)
(loop (1+ i) (parts+ (- stars i 1) (1- bars)) zs)
(loop i (cdr ps) (cons (cons i (car ps)) zs)))))))
[\sourcecode]

9. Soonseop said
```(define (parts+ stars bars)
(if (= bars 1)
(list (list stars))
(let loop ((i 1)
(ps (parts+ (- stars 1) (- bars 1)))
(zs (list)))
(if (> i (+ (- stars bars) 1))
zs
(if (null? ps)
(loop (1+ i) (parts+ (- stars i 1) (1- bars)) zs)
(loop i (cdr ps) (cons (cons i (car ps)) zs)))))))
```
10. Dennis Decker Jensen said

I tried CLP in gprolog 1.4.4, but couldn’t find a heuristic to find the optimal score of 18.

```% A phone book with this many names and phones per letter
phone_weights([17, 5,18,11,16, 5, 5, 7, 8,15,10,18,28,
7, 2,10, 1,13,21, 9, 1, 4, 5, 1, 4, 5]).

% Scan sum of number list (like +\ in J)
cumulative(Ns0,Ns1) :-
cumulative(Ns0,0,Ns1).

cumulative([],_,[]).
cumulative([X|Xs],Acc,[Y|Ys]) :-
Y is Acc + X,
cumulative(Xs,Y,Ys).

% Find optimal partition by branch-and-bound algorithm with restart.
% This finds multiple solutions, if there are any.
partition(Phone,A,B,C,Score) :-
cumulative(Phone,Phone_cum),
nth(26,Phone_cum,Last),
Target is Last div 4,
fd_domain([A,B,C],1,26),
B #> A, C #> B,
fd_minimize(score(Phone_cum,Last,Target,A,B,C,Score), Score).

% Minimization goal (with different heuristics)
score(Phone_cum,Last,Target,A,B,C,Score) :-
%fd_labelingff([A,B,C]),
%fd_labeling([A,B,C],[value_method(middle)]),
%fd_labeling([A,B,C],[value_method(max)]),
%fd_labeling([A,B,C],[value_method(bisect)]),
%fd_labeling([A,B,C],[value_method(min)]),
%fd_labeling([A,B,C],[value_method(bounds)]),
%fd_labeling([A,B,C],[value_method(random)]),
%fd_labeling([A,B,C],[variable_method(most_constrained)]),
%fd_labeling([A,B,C],[variable_method(smallest)]),
%fd_labeling([A,B,C],[variable_method(largest)]),
%fd_labeling([A,B,C],[variable_method(max_regret)]),
%fd_labeling([A,B,C],[variable_method(random)]),
%fd_labeling([A,B,C],[reorder(false)]),
fd_labeling([A,B,C]),
nth(A,Phone_cum,X),
nth(B,Phone_cum,Y),
nth(C,Phone_cum,Z),
%format(`X=%d Y=%d Z=%d~n`,[X,Y,Z]),
Score is abs(X-Target)
+ abs(Y-X-Target)
+ abs(Z-Y-Target)
+ abs(Last-Z-Target).
%format(`Score=%d~n`,[Score]).

show(A,B,C,Score) :-
format(`Score = %d: A-%c, %c-%c, %c-%c, %c-Z~n`,
[Score,64+A-1,64+A,64+B-1,64+B,64+C-1,64+C]).

test(A,B,C,Score) :-
phone_weights(P),
partition(P,A,B,C,Score).

go :-
phone_weights(P),
partition(P,A,B,C,Score),
show(A,B,C,Score),
% Continue until all minimization solutions are found:
fail; true.

:- initialization(go).

% 8041 bytes written, 6 ms
% Score = 22: A-C, D-J, K-P, Q-Z
% Score = 22: A-C, D-J, K-O, P-Z
% | ?-

% Branch-and-bound didn't find the minimal solution with score = 18
% That surprises me. I would have expected the value_method(middle)
% to find it, since it works from the middle towards the bounds.
% None of the heuristics differed in any way from the default labelling.
```
11. Dennis Decker Jensen said

I modified the prolog solution to only use constraints. No difference, except the program text is shorter and clearer: The score predicate vanishes, and the partition predicate is modified to be more direct.

```partition(Phone,A,B,C,Score) :-
cumulative(Phone,Phone_cum),
nth(26,Phone_cum,Last),
Target is Last div 4,
fd_set_vector_max(Last), % To enable enough solutions
fd_domain([A,B,C],1,26),
fd_all_different([A,B,C]),
fd_element(A,Phone_cum,X),
fd_element(B,Phone_cum,Y),
fd_element(C,Phone_cum,Z),
Score #= dist(X,Target)
+ dist(dist(Y,X),Target)
+ dist(dist(Z,Y),Target)
+ dist(dist(Last,Z),Target),
fd_minimize(fd_labeling([A,B,C,X,Y,Z,Score]), Score).
```
12. Dennis Decker Jensen said

If I fix Target on 63 instead of the calculated Last div 4 =:= 61 then branch-and-bound finds the 2 optimal solutions with score = 18: A-C, D-J, K-O, P-Z; and A-C, D-J, K-P, Q-Z.

13. Dennis Decker Jensen said

Of course, the truncation of div: Last is 246, and 246 / 4 =:= 61.5; 4 * 0.5 =:= 2 + 61 =:= 63.