Collect Sets Of Ranges

August 25, 2015

Here is a complicated program that mixes the control-break processing with printing to achieve the desired task:

(define (ranges . xs)
  (let loop ((xs xs) (in-run #f) (prev #f))
    (cond ((null? xs)
            (when (and in-run prev)
              (display "-")
              (display prev))
            (newline))
          (in-run
            (if (= (car xs) (+ 1 prev))
                (loop (cdr xs) #t (car xs))
                (begin (display "-")
                       (display prev)
                       (display ", ")
                       (display (car xs))
                       (loop (cdr xs) #f (car xs)))))
          ((and prev (= (car xs) (+ 1 prev)))
            (loop (cdr xs) #t (car xs)))
          (else (when prev (display ", "))
                (display (car xs))
                (loop (cdr xs) #f (car xs))))))

> (ranges 0 1 2 7 21 22 108 109)
0-2, 7, 21-22, 108-109

The in-run variable keeps track of whether or not we are in a run, and the prev variable, initially #f, tracks the previous value in the sequence.

It is easier to separate the tasks of control-break processing and printing. Here is a function that collects groups of integers:

(define (collect . xs)
  (if (null? xs) (list)
    (let loop ((xs (cdr xs))
               (start (car xs))
               (prev (car xs))
               (zs (list)))
      (cond ((null? xs) ; end of input
              (if (= start prev)
                  (reverse (cons prev zs))
                  (reverse (cons (cons start prev) zs))))
            ((= (car xs) (+ prev 1)) ; continue run
              (loop (cdr xs) start (car xs) zs))
            (else ; end run, start new run
              (if (= start prev)
                  (loop (cdr xs) (car xs) (car xs)
                        (cons prev zs))
                  (loop (cdr xs) (car xs) (car xs)
                        (cons (cons start prev) zs))))))))

> (collect 0 1 2 7 21 22 108 109)
((0 . 2) 7 (21 . 22) (108 . 109))

That’s still complicated, but it’s easier to read than ranges, and simpler, too, since there are only three clauses in the cond rather than four. Once you’ve collected the ranges, you can display them or use them in some other way:

(define (display-ranges xs)
  (cond ((pair? (car xs))
          (display (caar xs))
          (display "-")
          (display (cdar xs)))
  (else (display (car xs))))
  (when (pair? (cdr xs))
    (display ", ")
    (display-ranges (cdr xs))))

> (display-ranges (collect 0 1 2 7 21 22 108 109))
0-2, 7, 21-22, 108-109

If you’re clever, you can pass the break function as a parameter, which makes the collector function much more general:

(define (collect-by break? . xs)
  (if (null? xs) (list)
    (let loop ((xs (cdr xs))
               (start (car xs))
               (prev (car xs))
               (zs (list)))
      (cond ((null? xs) ; end of input
              (if (equal? start prev)
                  (reverse (cons prev zs))
                  (reverse (cons (cons start prev) zs))))
            ((break? prev (car xs)) ; continue run
              (loop (cdr xs) start (car xs) zs))
            (else ; end run, start new run
              (if (equal? start prev)
                  (loop (cdr xs) (car xs) (car xs)
                        (cons prev zs))
                  (loop (cdr xs) (car xs) (car xs)
                        (cons (cons start prev) zs))))))))

Then we can collect and display the ranges like this:

(define (consecutive? a b) (= (+ a 1) b))

> (display-ranges (collect-by consecutive? 0 1 2 7 21 22 108 109)
0-2, 7, 21-22, 108-109

Another approach is to return a list of splits, which is particularly pretty when you use a pattern-match on the input list; xs is the input list, ys is the current run, and xss is the output list in reverse order:

(define (split-between pred? xs)
  (let loop ((xs xs) (ys (list)) (xss (list)))
    (list-match xs
      (() (reverse (cons (reverse ys) xss)))
      ((x) (reverse (cons (reverse (cons x ys)) xss)))
      ((x1 x2 . more) (pred? x1 x2) ; x2 starts new group
        (loop (cons x2 more) (list) (cons (reverse (cons x1 ys)) xss)))
      ((x1 x2 . more) (loop (cons x2 more) (cons x1 ys) xss)))))

(define (display-ranges . xs)
  (let loop ((xss (split-between (complement consecutive?) xs)))
    (cond ((and (pair? (car xss)) (pair? (cdar xss)))
            (display (caar xss))
            (display "-")
            (display (car (reverse (car xss)))))
    (else (display (caar xss))))
    (when (pair? (cdr xss))
      (display ", ")
      (loop (cdr xss)))))

> (display-ranges (split-between consecutive? '(0 1 2 7 21 22 108 109))
0-2, 7, 21-22, 108-109

Split-between is generally useful, and might make it into the Standard Prelude someday.

We used list-match and complement from the Standard Prelude. You can run the program at http://ideone.com/DxR3Xp, where you will also find a version of split-between that doesn’t use pattern matching.

Our experience here is instructive. We started with a working program that was a little bit of a mess. First we restructured to separate the control-break from the output, then we generalized the control-break processing to create a new idiom. Good programmers learn to recognize idioms because they simplify the problem and lead to code reuse, which is both economical and likely to reduce bugs.

Advertisement

Pages: 1 2

25 Responses to “Collect Sets Of Ranges”

  1. Rutger said

    Keeps track of what is visited, using a Python generator to yield each range found.

    def set_of_ranges(seq):
    	if not seq:
    		raise StopIteration
    	a_range = [seq[0]]
    	for i,value in enumerate(seq[1:]):
    		if value == a_range[-1] + 1:
    			a_range.append(value)
    		else:
    			yield "%i - %i"%(a_range[0], a_range[-1])
    			a_range = [value]
    	yield "%i - %i"%(a_range[0], a_range[-1])
    
    
    
    test_sequences = [[], [0,1,2,3,4,-1,0,1], [0, 1, 2, 7, 21, 22, 108, 109]]
    
    for seq in test_sequences:
    	print "Test sequence:", seq
    	for r in set_of_ranges(seq):
    		print r
    
    # output
    # Test sequence: []
    # Test sequence: [0, 1, 2, 3, 4, -1, 0, 1]
    # 0 - 4
    # -1 - 1
    # Test sequence: [0, 1, 2, 7, 21, 22, 108, 109]
    # 0 - 2
    # 7 - 7
    # 21 - 22
    # 108 - 109
    
  2. Paul said

    In Python.

    def collected_ranges(seq):
        it = iter(seq)
        last = (next(it),)
        for e in it:
            if e - last[-1] == 1:
                last = (last[0], e)
            else:
                yield last
                last = (e,)
        yield last
    
    def str_range(r):
        return str(r[0]) if len(r) == 1 else str(r[0]) + "-" + str(r[1])
            
    seq = [0, 1, 2, 7, 21, 22, 108, 109 ]
    print(", ".join(str_range(r) for r in collected_ranges(seq)))
    
  3. matthew said
    import Data.List
    g [] = []
    g (n:s) = f n n s where
      f m n [] = [(m,n)]
      f m n (p:s)
       | n+1 == p = f m p s
       | otherwise = (m,n):f p p s
    h (m,n)
     | m == n = show m
     | otherwise = show m ++ "-" ++ show n
    ranges = concat . intersperse "," . map h . g
    main = putStrLn $ ranges [0,1,2,7,21,22,108,109]
    
  4. Dennis Decker Jensen said

    It’s a classic problem that Jackson suggested a simple solution for.
    You structure the control flow after the data, i.e. one outer loop for the break points, and one inner loop for a range. The point is to have the state implicitly coded into the control flow, rather than manipulating an explicit state in a single loop. In this case this can be done by “reading ahead” before the loop, and then “reading” at the end of the loop.

    To put it simple: Follow the structure of the data. This makes the loops very simple and easy to understand.

    def collect(seq):
        result = []
        it = iter(seq)
        e = next(it, None)
        while e is not None:
            fst, snd = e, e
            e = next(it, None)
            while e is not None and e == snd + 1:
                snd = e
                e = next(it, None)
            range = (fst, snd)
            result.append(range)
        return result
    
    print(collect([0, 1, 2, 7, 21, 22, 108, 109]))
    print(collect([]))
    print(collect([0, 1, 2, 3, 4, -1, 0, 1, 10]))
    print(collect([0, 2, 3, 4, -3, -2, -1, 10]))
    
  5. FA said
      def collectRanges(in: Stream[Int], left: Option[Int] = None, right: Option[Int] = None): Stream[Tuple2[Int, Int]] = {
        println(in.toList+" "+left+" "+right)
        if (left == None)
          if (in.isEmpty) Stream.Empty
          else collectRanges(in.tail, Some(in.head), Some(in.head))
        else
          if (in.isEmpty) Stream((left.get, right.get))
          else if (in.head == right.get + 1) collectRanges(in.tail, left, Some(right.get + 1))
          else Stream((left.get, right.get)) #::: collectRanges(in)
      }
    
  6. FA said

    Oh yeah, the previous comment is the solution as a Scala Stream

  7. klettier said

    F# solution :

    let set = [-1;0; 1; 2; 7; 21; 22; 108; 109; 110; 115 ; 116 ; 117]
    
    let f ar =
        let rec loop arr last range isInRange (acc: string list) = 
            match arr with
            | a :: tail when range = "" -> loop tail a (a.ToString()) false acc 
            | a :: tail when last = (a - 1) -> loop tail a range true acc 
            | a :: tail -> if(not isInRange) then
                                range :: acc
                                |> loop tail a (a.ToString()) false
                              else (range + "-" + last.ToString()) :: acc
                                |> loop tail a (a.ToString()) false
            | [] -> if(not isInRange) then
                       range :: acc
                     else (range + "-" + last.ToString()) :: acc
        loop ar -1 "" false [] |> List.rev
    
    let result = f set;;
    
    //val result : string list = ["-1-2"; "7"; "21-22"; "108-110"; "115-117"]
    
    
  8. matthew said

    Here’s another Haskell solution, no fancy output this time, but using a general splitting function with foldl operating on the partially constructed ranges:

    import Control.Arrow ((&&&))
    split p (x:s) = foldl aux [[x]] s where
      aux (s@(x:_):acc) y
        | p x y = (y:s):acc
        | otherwise = [y]:s:acc
    ranges = reverse . map (last &&& head) . split ((==).(+1))
    main = print $ ranges [0,1,2,7,21,22,108,109]
    -- [(0,2),(7,7),(21,22),(108,109)]
    
  9. matthew said

    Here’s another Haskell solution, no fancy output this time, but using a general splitting function with foldl operating on the partially constructed ranges:

    import Control.Arrow ((&&&))
    split p (x:s) = foldl aux [[x]] s where
      aux (s@(x:_):acc) y
        | p x y = (y:s):acc
        | otherwise = [y]:s:acc
    ranges = reverse . map (last &&& head) . split ((==).(+1))
    main = print $ ranges [0,1,2,7,21,22,108,109]
    -- [(0,2),(7,7),(21,22),(108,109)]
    

    (No Haskell highlighter – using “text” here)

  10. bitchef said

    Just as I would do with a pen and paper, I track the consecutive numbers; their differences is one. Anything other than that marks a breakpoint.
    Here is the implementation in JavaScript:

     
    function makeRange(nums){
      // store differences betweens the numbers.
      // This is a way to know consequtive numbers
      var diffs = [];
      for (var i = 1; i < nums.length; i++){
        diffs.push (nums[i] - nums[i-1]);
      }
      
      // For example, for the numbers : 1,2,3,5,6,9
      // The differences are :          1,1,2,1,3
      // Numbers that are not 1 mark the break points.
      // Will now use these break points to construct the ranges
      var res = []; 
      for (var j = 0, i = 0; j < nums.length; j++){
      // for each break point, get the indexes up to, but not including, that point.
       if (diffs[j] != 1){
         var range = nums.slice(i,j+1);
         res.push(range);
         i = j + 1;
       }
      }
      
      return res;
    }
    

    Sample run:

    > makeRange([1,2,3,4,5])
    [ [ 1, 2, 3, 4, 5 ] ]
    > makeRange([1,2,23,45,46,47])
    [ [ 1, 2 ], [ 23 ], [ 45, 46, 47 ] ]
    > makeRange([-1,-2,-23,-45,-46,-47])
    [ [ -1 ], [ -2 ], [ -23 ], [ -45 ], [ -46 ], [ -47 ] ]
    > makeRange([-3,-2,-1])
    [ [ -3, -2, -1 ] ]
    > makeRange([-3,-2,-1,0,1,3,4,5,100,101,102])
    [ [ -3, -2, -1, 0, 1 ], [ 3, 4, 5 ], [ 100, 101, 102 ] ]
    
  11. Ajay Yadav said

    public class Range {

    public static String getRange(int a[], int index)
    {

    if(index == a.length-1)
    return “”+a[index];
    else{

    int i=index;
    int start=a[i];
    int end=0;

    while(a[i+1]==a[i]+1)
    i++;

    end=a[i];

    if(end!=start)
    return start+”-“+end+”,”+getRange(a,i+1);
    else
    return start+”,”+getRange(a,i+1);
    }
    }

    public static void main(String[] args) {

    int a[]={1,2,3,4,5,6,8,9,10,-34,-33,36,37,4,8,9,0,1,1,1};

    System.out.println(getRange(a,0));
    }
    }

  12. Globules said

    Another Haskell solution. We allow finding ranges for types other than numbers.

    -- True if and only if x is either the predecessor of y or equal to it.
    isPreq :: (Bounded a, Enum a, Eq a) => a -> a -> Bool
    x `isPreq` y = x == y || y /= minBound && x == pred y
    
    -- Condense a list into a sequence of ranges.  The range of a single element
    -- x is (x,x).
    ranges :: (Bounded a, Enum a, Eq a) => [a] -> [(a, a)]
    ranges = foldr step []
      where step x [] = [(x,x)]
            step x ((a,b):cs) | x `isPreq` a =       (x,b):cs
                              | otherwise    = (x,x):(a,b):cs
    
    -------------------------------------------------------------------------------
    
    data Spectrum = Red | Orange | Yellow | Green | Blue | Indigo | Violet
                  deriving (Bounded, Enum, Eq, Show)
    
    main :: IO ()
    main = do
      -- The list of numbers from the problem.
      let ts = [0, 1, 2, 7, 21, 22, 108, 109] :: [Int]
      putStrLn $ show ts ++ " =>\n  " ++ show (ranges ts)
    
      -- The numbers can be repeated and don't have to be increasing.
      let us = [108, 109, 0, 1, 1, 2, 2, 21, 22, 7] :: [Int]
      putStrLn $ show us ++ " =>\n  " ++ show (ranges us)
    
      -- We can generate ranges for types other than numbers.  The ordering is
      -- determined by the declaration of Spectrum.
      let ss = [Indigo, Violet, Red, Yellow, Green, Blue, Orange]
      putStrLn $ show ss ++ " =>\n  " ++ show (ranges ss)
    
    $ ./ranges 
    [0,1,2,7,21,22,108,109] =>
      [(0,2),(7,7),(21,22),(108,109)]
    [108,109,0,1,1,2,2,21,22,7] =>
      [(108,109),(0,2),(21,22),(7,7)]
    [Indigo,Violet,Red,Yellow,Green,Blue,Orange] =>
      [(Indigo,Violet),(Red,Red),(Yellow,Blue),(Orange,Orange)]
    
  13. matthew said

    @Globules: nice. Interesting that neither of our fold-based solutions work for infinite lists, whereas my original non-fold-based one does.

  14. matthew said

    So, here is another Haskell solution, using scanl as the looping construct rather than a fold – scanl is like foldl but returns a list of the intermediate results, so we can map over that to extract the actual ranges. To deal with the end of input case, we need to munge the data a little, so we can’t just use concat. Infinite lists are dealt with just fine:

    ranges (x:s) = munge $ scanl f ([],(x,x)) s where
      f (_,(a,b)) c
        | b+1 == c = ([],(a,c))
        | otherwise = ([(a,b)],(c,c))
      munge [(x,a)] = x ++ [a]
      munge ((x,_):s) = x ++ munge s
    
    s = 0:1:2:map (+4) s
    
    main = 
         (print $ ranges [0,1,2,7,21,22,108,109]) >>
         (print $ ranges [1]) >>
         (print $ take 10 $ ranges s)
    
  15. Globules said

    @matthew: Thanks. Yeah, I wasn’t even thinking about infinite lists when I wrote it. (Bad programmer!)

    Another annoyance is that in generalizing it to Enum I had to also use Bounded to ensure that pred didn’t throw. But of course now it doesn’t work for Integer. Having maybePred and maybeSucc would be nice.

  16. matthew said

    Aha, yes, I didn’t see that problem. Here’s another iterative solution, this time using unfoldr. It’s rather like Dennis’s nested loop above. Once again, we have to do some jiggerypokery to terminate properly:

    import Data.List
    
    ranges :: (Eq t, Num t) => [t] -> [(t, t)]
    ranges [] = []
    ranges (a:s) = unfoldr (fmap aux) $ Just(a,a,s) where
      aux (a,b,[]) = ((a,b),Nothing)
      aux (a,b,c:s)
         | b+1 == c = aux (a,c,s)
         | otherwise = ((a,b),Just(c,c,s))
    
    ss :: [Integer]
    ss = 0:1:2:map (+4) ss
    
    main :: IO()
    main = 
         (print $ ranges ([0,1,2,7,21,22,108,109] :: [Int])) >>
         (print $ ranges ([1] :: [Int])) >>
         (print $ take 10 $ ranges ss)
    
  17. Mike said

    itertools.groupby() from the standard library creates an iterator that breaks an iterable into (key, group) pairs.

    Here, a class is used to maintain state–the previous value and the current group number. The class implements the __call__ method to be used as the key func for groupby(). The method applies a predicate to the previous and current items. If the predicate is true, the group number is returned. If the predicate is false, the group number is changed (incremented) first. As a result, each group gets a sequential number assigned.

    def collect(seq, predicate):
        class Grouper:
            def __init__(self, predicate):
                self.group_no = 0
                self.predicate = predicate
                self.prev_value = None
                
            def __call__(self, new_value):
                if self.prev_value is not None and not self.predicate(self.prev_value, new_value):
                    self.group_no += 1
                self.prev_value = new_value
                return self.group_no
            
        yield from (g for _,g in it.groupby(seq, Grouper(predicate)))
    
    [list(g) for g in collect(seq, lambda a,b: a + 1 == b)]
    

    This can also be implemented using a generator/coroutine. The generator’s send() method is used as the key func for groupby().

    def collect(seq, predicate):
        def grouper(predicate):
            group_no = 0
            new_value = yield
            while True:
                prev_value, new_value = (new_value, (yield group_no))
                if not predicate(prev_value, new_value):
                    group_no += 1
                    
        gpr = grouper(predicate)
        next(gpr)
        yield from (g for _,g in it.groupby(seq, gpr.send))
    
    [list(g) for g in collect(seq, lambda a,b: a + 1 == b)]
    
  18. matthew said

    Definitely the last Haskell solution from me. The pattern here is a sort of map with state, with some special handling of the final state; this can all be nicely captured by the mapAccumL function, as in:

    import Data.List
    
    ranges [] = []
    ranges (x:s) = concat t ++ [a] where 
      (a,t) = mapAccumL f (x,x) s
      f (m,n) p = if n+1 == p then ((m,p),[])
                  else ((p,p),[(m,n)])
    
    main = print $ ranges [0,1,2,7,21,22,108,109]
    
  19. Will Ness said

    Haskell. A job for a right fold with a lazy combining function, creating the shape of the return value first, and filling it up with data from the recursive result later, like any well-behaved tail recursive “modulo cons” program would do.

    ranges = foldr (\x r-> let (b:c) = [ [] | null r || x+1 /= (head$head r)]++r in ((x:b):c)) []
    

    (the list comprehension is there just to avoid the verbal and verbose “case” statement).

  20. matthew said

    @Will Ness: Very ingenious. We can rewrite Globules original foldr solution (using a case statement here, I like pattern matching) as:

    ranges = foldr step [] where
      step x r = (x,y):s where
        (_,y):s = case r of
          [] -> (x,x):r
          (a,_):_ -> if x+1 == a then r else (x,x):r
    

    and get proper behaviour on infinite lists. It does all seem a bit awkward though – I’m not convinced I like any of the functional solutions more than the original nice and simple recursive method.

  21. Will Ness said

    @matthew: YMMV. I learned to stop worrying, and love the fold. :) It’s easier for me that way: the recursion is already there, `x` is for the value, `r` is for recursive result, and all that’s left for me to do is to decide what goes where. It’s all about the information flow anyway. Here’s (http://stackoverflow.com/a/14403413/849891) where I saw this technique first, in the context of Haskell. In Prolog it’s commonplace, but there we only have the direct recursion (in plain language, that is).

    Turnes out there’s also my answer on that SO question, with another solution, a paramophism which might feel more natural.

    Also, your “jiggerypokery” `unfoldr` might be related to this (http://stackoverflow.com/q/24519530/849891) “slightly generalizing unfold”, which just might be an apomorphism.

  22. matthew said

    Well, I like a nice fold too, but sometimes things don’t always fit nicely into that pattern. Foldr has nice algebraic properties, but it seems difficult to get right, particularly with lazy lists (and there seems to be a correlation between not working with lazy lists and having a space leak) – maybe that gets easier with practice though, as you suggest. Apomorphism sounds like just the ticket, thanks for that; disjoint unions are an underused datatype.

    Incidentally, I came up with this for finding the end points of the ranges, but I’m not sure what to make of it:

    ranges (x:s) = foldr (\p g (m,n) -> if n+1 == p then g(m,p) else (m,n) : g(p,p)) (:[]) s (x,x)
    
  23. matthew said

    So, my original function, which was more or less:

    ranges (x:s) = f (x,x) s where
      f (m,n) [] = [(m,n)]
      f (m,n) (p:s)
        | n+1 == p = f (m,p) s
        | otherwise = (m,n) : f (p,p) s
    

    works nicely with infinite lists and all that, but doesn’t easily fit the foldr recursion scheme because of that extra state parameter – sometimes the recursion is ‘f (m,p) s’ and sometimes it is ‘f (p,p) s’. But we can swap the order of the parameters to f and get:

    ranges2 (x:s) = f s (x,x) where
      f [] (m,n) = [(m,n)]
      f (p:s) (m,n)
        | n+1 == p = f s (m,p)
        | otherwise = (m,n) : f s (p,p)
    

    and this does fit the foldr pattern, giving:

    ranges3 (x:s) = foldr (\p g (m,n) -> if n+1 == p then g(m,p) else (m,n) : g(p,p)) (:[]) s (x,x)
    

    where the function on the input list is returning another function that is then applied to initial state. Actually running this with eg. ‘ranges [1..100000000]’ and we get the answer ‘[(1,100000000)]’ in a few seconds, with no memory bloat. Once again, I am very impressed with the Haskell compiler.

  24. matthew said

    On the other hand, all the compiler has really done is inline the definition of foldr, so maybe it’s not that clever.

  25. r. clayton said

    A solution in Racket Scheme.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: