Marzullo’s Algorithm

November 15, 2016

The obvious algorithm keeps a hash table or some other search structure, iterates through the sourcess, and for each source iterates through the time intervals, updating a count for each event. That takes time O(mn), where n is the number of sources and m is the average number of time intervals per source.

A better algorithm was developed by Keith Marzullo and forms part of the Network Time Protocol. It works by building a table of arrival and departure times, two entries per source; the first entry is a two-tuple containing (starttime, -1), and the second entry is a two-tuple containing (endtime, 1). The table is sorted by the time component, with start times sorted before end times when the time components are equal. Here is our code to build the table:

(define (make-table arrivals departures)
  (define (lt? xs ys)
    (if (= (car xs) (car ys))
        (< (cdr xs) (cdr ys))
        (< (car xs) (car ys))))
  (sort lt?
    (append
      (map (lambda (x) (cons x -1)) arrivals)
      (map (lambda (x) (cons x 1)) departures))))

Then the algorithm scans the table, keeping two accumulators best and cnt. At each table element, the type is subtracted from the cnt, which tracks the current number of active events. If the new cnt exceeds the current best, a new best is recorded and a new range is saved:

(define (marzullo arrivals departures)
  (let ((best 0) (cnt 0) (beststart 0) (bestend 0))
    (let loop ((t (make-table arrivals departures)))
      (if (null? t)
          (values best beststart bestend)
          (begin
            (set! cnt (- cnt (cdar t)))
            (when (< best cnt)
              (set! best cnt)
              (set! beststart (caar t))
              (set! bestend (caadr t)))
            (loop (cdr t)))))))

Here’s the algorithm at work on the sample problem, showing 9 employees at work at 2:00pm:

> (define arrivals   '(10 12 11 13 14 12  9 14 12 10))
> (define departures '(15 14 17 15 15 16 13 15 17 18))
> (marzullo arrivals departures)
9
14
14

If we change the hours for employee 2 to 12-15 and for employee 7 to 9-15, then there will be 10 employees present from 2:00pm to 3:00pm:

> (define departures '(15 15 17 15 15 16 15 15 17 18))
> (marzullo arrivals departures)
10
14
15

Marzullo’s problem runs in time O(n log n), due to the sort, and doesn’t depend on the number of hours each employee works. You can run the program at http://ideone.com/d8Qvo0.

Advertisement

Pages: 1 2

5 Responses to “Marzullo’s Algorithm”

  1. Rutger said

    In Python. Probably should have used the collections.Counter, but this will do.

    arrivals = [10,12,11,13,14,12,9,14,12,10]
    departures = [15,14,17,15,15,16,13,15,17,18]
    
    counts_per_hour = [0]*(max(arrivals+departures)+2)
    
    for arrival, departure in zip(arrivals, departures):
    	counts_per_hour[arrival] += 1
    	counts_per_hour[departure+1] -=1
    print counts_per_hour
    
    number_of_events = 0
    for i,v in enumerate(counts_per_hour):
    	number_of_events += v
    	print 'At hour', i, ': the number of events is', number_of_events
    
    

    Outputs:
    At hour 0 : the number of events is 0
    At hour 1 : the number of events is 0
    At hour 2 : the number of events is 0
    At hour 3 : the number of events is 0
    At hour 4 : the number of events is 0
    At hour 5 : the number of events is 0
    At hour 6 : the number of events is 0
    At hour 7 : the number of events is 0
    At hour 8 : the number of events is 0
    At hour 9 : the number of events is 1
    At hour 10 : the number of events is 3
    At hour 11 : the number of events is 4
    At hour 12 : the number of events is 7
    At hour 13 : the number of events is 8
    At hour 14 : the number of events is 9
    At hour 15 : the number of events is 8
    At hour 16 : the number of events is 4
    At hour 17 : the number of events is 3
    At hour 18 : the number of events is 1
    At hour 19 : the number of events is 0

  2. matthew said

    Here’s some Javascript. Also an example of random intervals (possibly more like the NTP usage with less intervals with common start/end points).

    const f = (arrivals,departures) =>
          arrivals.map(t => [t,-1])
          .concat(departures.map(t => [t,1]))
          .sort(([t1,s1],[t2,s2]) => t1==t2 ? s1-s2 : t1-t2)
          .reduce(([n,maxn,max,maxend],[t,s],i,a) =>
                  (n => n <= maxn ? [n,maxn,max,maxend]
                   : [n,n,t,a[i+1][0]])(n-s),
                  [0,0,0,0])
    
    const arrivals =    [10,12,11,13,14,12, 9,14,12,10]
    const departures1 = [15,14,17,15,15,16,13,15,17,18]
    const departures2 = [15,15,17,15,15,16,15,15,17,18]
    console.log(f(arrivals,departures1))
    console.log(f(arrivals,departures2))
    const data =
          (a => [a.map(s => s[0]), a.map(s => s[1])])
          (Array(6).fill()
           .map(_ =>((x,y)=>[Math.min(x,y),Math.max(x,y)])
                    (Math.random(),Math.random())))
    console.log(data)
    console.log(f.apply(null,data))
    
  3. V said

    In Ruby

      
    def time_block_with_most_events(events)
      min_t, max_t = events.flatten.minmax
      
      events_on_times = (min_t..max_t).map do |t|
        [ t, events.select { |x, y| t.between?(x, y) }.count ]
      end
      
      max_events  = events_on_times.map { |t, c| c }.max
      time_blocks = events_on_times.select { |t, c| c == max_events }.map(&:first)  
      [max_events] + time_blocks.minmax
    end
    
    arrivals    =  [10,12,11,13,14,12, 9,14,12,10]
    departures1 =  [15,14,17,15,15,16,13,15,17,18]
    departures2 =  [15,15,17,15,15,16,15,15,17,18]
    
    time_block_with_most_events(arrivals.zip(departures1))
    => [9, 14, 14]
      
    time_block_with_most_events(arrivals.zip(departures2))
    => [10, 14, 15]
    
    
  4. MUMPS V1.66 for Linux x86_64 Built Feb 27 2017 at 17:12:15

    marzullo ;New routine
    n arrivals,departures,end,hour,hours,i,j,max,start
    s arrivals=”10,12,11,13,14,12,9,14,12,10″
    s departures=”15,14,17,15,15,16,13,15,17,18″
    s hour=99,max=0
    f i=1:1:$l(arrivals,”,”) d
    . s start=$p(arrivals,”,”,i),end=$p(departures,”,”,i)
    . f j=start:1:end d
    . . s hours(j)=$g(hours(j))+1
    . . s:hours(j)>max max=hours(j),hour=j
    w !,”There were “,max,” employees at work at time “,hour
    q

    d ^marzullo

    There were 9 employees at work at time 14

  5. sources (ones which do not overlap the optimal interval returned) is the number of sources minus the value of best. Marzullo’s algorithm is efficient in both space and time. The asymptotic space usage is O(n), where n is the number of sources. In considering the asymptotic time requirement the algorithm can be considered to consist of building the table, sorting it and searching it. Sorting can be done in O(n log n) time, and this dominates the building and searching phases which can be performed in linear time. Therefore, the time efficiency of Marzullo’s algorithm is O(n log n) .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: