Marzullo’s Algorithm
November 15, 2016
The obvious algorithm keeps a hash table or some other search structure, iterates through the sourcess, and for each source iterates through the time intervals, updating a count for each event. That takes time O(mn), where n is the number of sources and m is the average number of time intervals per source.
A better algorithm was developed by Keith Marzullo and forms part of the Network Time Protocol. It works by building a table of arrival and departure times, two entries per source; the first entry is a two-tuple containing (starttime, -1), and the second entry is a two-tuple containing (endtime, 1). The table is sorted by the time component, with start times sorted before end times when the time components are equal. Here is our code to build the table:
(define (make-table arrivals departures) (define (lt? xs ys) (if (= (car xs) (car ys)) (< (cdr xs) (cdr ys)) (< (car xs) (car ys)))) (sort lt? (append (map (lambda (x) (cons x -1)) arrivals) (map (lambda (x) (cons x 1)) departures))))
Then the algorithm scans the table, keeping two accumulators best and cnt. At each table element, the type is subtracted from the cnt, which tracks the current number of active events. If the new cnt exceeds the current best, a new best is recorded and a new range is saved:
(define (marzullo arrivals departures) (let ((best 0) (cnt 0) (beststart 0) (bestend 0)) (let loop ((t (make-table arrivals departures))) (if (null? t) (values best beststart bestend) (begin (set! cnt (- cnt (cdar t))) (when (< best cnt) (set! best cnt) (set! beststart (caar t)) (set! bestend (caadr t))) (loop (cdr t)))))))
Here’s the algorithm at work on the sample problem, showing 9 employees at work at 2:00pm:
> (define arrivals '(10 12 11 13 14 12 9 14 12 10)) > (define departures '(15 14 17 15 15 16 13 15 17 18)) > (marzullo arrivals departures) 9 14 14
If we change the hours for employee 2 to 12-15 and for employee 7 to 9-15, then there will be 10 employees present from 2:00pm to 3:00pm:
> (define departures '(15 15 17 15 15 16 15 15 17 18)) > (marzullo arrivals departures) 10 14 15
Marzullo’s problem runs in time O(n log n), due to the sort, and doesn’t depend on the number of hours each employee works. You can run the program at http://ideone.com/d8Qvo0.
In Python. Probably should have used the collections.Counter, but this will do.
Outputs:
At hour 0 : the number of events is 0
At hour 1 : the number of events is 0
At hour 2 : the number of events is 0
At hour 3 : the number of events is 0
At hour 4 : the number of events is 0
At hour 5 : the number of events is 0
At hour 6 : the number of events is 0
At hour 7 : the number of events is 0
At hour 8 : the number of events is 0
At hour 9 : the number of events is 1
At hour 10 : the number of events is 3
At hour 11 : the number of events is 4
At hour 12 : the number of events is 7
At hour 13 : the number of events is 8
At hour 14 : the number of events is 9
At hour 15 : the number of events is 8
At hour 16 : the number of events is 4
At hour 17 : the number of events is 3
At hour 18 : the number of events is 1
At hour 19 : the number of events is 0
Here’s some Javascript. Also an example of random intervals (possibly more like the NTP usage with less intervals with common start/end points).
In Ruby
MUMPS V1.66 for Linux x86_64 Built Feb 27 2017 at 17:12:15
marzullo ;New routine
n arrivals,departures,end,hour,hours,i,j,max,start
s arrivals=”10,12,11,13,14,12,9,14,12,10″
s departures=”15,14,17,15,15,16,13,15,17,18″
s hour=99,max=0
f i=1:1:$l(arrivals,”,”) d
. s start=$p(arrivals,”,”,i),end=$p(departures,”,”,i)
. f j=start:1:end d
. . s hours(j)=$g(hours(j))+1
. . s:hours(j)>max max=hours(j),hour=j
w !,”There were “,max,” employees at work at time “,hour
q
d ^marzullo
There were 9 employees at work at time 14
sources (ones which do not overlap the optimal interval returned) is the number of sources minus the value of best. Marzullo’s algorithm is efficient in both space and time. The asymptotic space usage is O(n), where n is the number of sources. In considering the asymptotic time requirement the algorithm can be considered to consist of building the table, sorting it and searching it. Sorting can be done in O(n log n) time, and this dominates the building and searching phases which can be performed in linear time. Therefore, the time efficiency of Marzullo’s algorithm is O(n log n) .