Sparse Sets

March 9, 2012

These sparse sets make for very simple code. We begin with global variables for n and the two vectors; note that we don’t assign an initial value:

(define n 0)
(define d #f)
(define s #f)

(define (make-set u)
  (set! d (make-vector u))
  (set! s (make-vector u)))

Clearing the set is simple:

(define (clear) (set! n 0))

We assume that a value k is not present in the set during insertion:

(define (insert k)
  (vector-set! d n k)
  (vector-set! s k n)
  (set! n (+ n 1)))

The check for set membership takes two clauses:

(define (member? k)
  (and (< (vector-ref s k) n)
       (= (vector-ref d (vector-ref s k)) k)))

And finally we perform iteration; note that elements of the set are accessed in the order they entered the set:

(define (for-all proc)
  (do ((i 0 (+ i 1))) ((= i n))
    (proc (vector-ref d i))))

Here are some examples:

> (make-set 8)
> (clear)
> (insert 5)
> (insert 1)
> (insert 4)
> (member? 4)
#t
> (member? 3)
#f

You can run the program at http://programmingpraxis.codepad.org/CYb0TmWR.

Pages: 1 2

8 Responses to “Sparse Sets”

  1. Jussi Piitulainen said

    Thanks, this was a good thing to learn.

    Instead of D, S, n, u global, I packaged them in a single vector. This may not be appropriate when a program needs to manipulate a single such set very fast, but with a sufficiently smart compiler …

    (define (make-sparse-set u) (make-vector (+ u u 1) 0))            ;DSn, n=0
    (define (sparse-size set) (vector-ref set (- (vector-length set) 1))) ;n
    (define (sparse-cap set) (quotient (vector-length set) 2))            ;u
    (define (dense-ref set k) (vector-ref set k))                         ;D[k]
    (define (sparse-ref set k) (vector-ref set (+ (sparse-cap set) k)))   ;S[k]
    (define (sparse-member? k set)
      (and (< (sparse-ref set k) (sparse-size set))                   ;S[k] < n
           (= (dense-ref set (sparse-ref set k)) k)))              ;D[S[k]] = k
    (define (sparse-insert! set k)
      (if (sparse-member? k set) (error '!))
      (vector-set! set (sparse-size set) k)                          ;D[n] <- k
      (vector-set! set (+ (sparse-cap set) k) (sparse-size set))     ;S[k] <- n
      (vector-set! set (- (vector-length set) 1) (+ (sparse-size set) 1))) ;++n
    (define (sparse-for-each proc set)
      (do ((k 0 (+ k 1))) ((= k (sparse-size set))) (proc (vector-ref set k))))
    (define (sparse-clear! set) (vector-set! set (- (vector-length set) 1) 0))
    
  2. Paul G. said

    here is a Perl version:

    use warnings;
    use strict;

    my @dense;
    my @sparse;
    my $n;

    sub get_number {
    my $op = shift;
    my $k;
    do {
    print "Enter number to $op: ";
    $k = <STDIN>;
    chomp($k);
    } while ($k !~ /^\d+$/);
    return $k;
    }

    sub sparse_clear {
    $n = 0;
    }

    sub sparse_insert {
    my $k = get_number(‘insert’);
    $dense[$n] = $k;
    $sparse[$k] = $n;
    $n++;
    }

    sub sparse_lookup {
    my $k = get_number(‘lookup’);
    ( defined($sparse[$k]) &&
    ($sparse[$k] < $n) &&
    ($dense[$sparse[$k]] == $k)
    ) ? print "exists\n" : print "doesn’t exist\n";
    }

    sub sparse_iterate {
    my $largest_sparce = 0;
    print "Dense: ";
    for (0 .. $n-1) {
    print "$dense[$_] ";
    $largest_sparce = $dense[$_] if ($dense[$_] > $largest_sparce);
    }
    print "\n";
    print "Sparse ";
    for (0 .. $largest_sparce) {
    defined($sparse[$_]) ? print "$sparse[$_] " : print "u ";
    }
    print "\n";
    }

    sub sparse_quit {
    exit;
    }

    my @ops = qw(clear insert lookup iterate quit);
    while(1) {
    print "\n";
    for (0 .. scalar(@ops)-1) {
    print "$_: $ops[$_]\n";
    }
    print "Enter operation: ";

    my $choice = <STDIN>;
    chomp($choice);
    if ($choice =~ /^[0-4]$/) {
    &{\&{"sparse_$ops[$choice]"}}();
    }
    }

    Of course, the easier (and arguably better) way to do this would be to use one of the many existing perl modules which already implement sparse matrix arithmetic: http://search.cpan.org/search?query=sparse&mode=all

  3. ardnew said

    Very cool post. We often deal with large sparse data structures at work, but unfortunately our bottleneck is typically available memory and not processing time. Something like this is ideal for modern hardware though.

    Just to emphasize your note, the maintenance of the variable n is crucial. Not only does it allow for constant-time initialization, clearing, and element-counting, but it is also the only way to guarantee all elements in the universe have unambiguous values.

    Definitely going to give this a try sometime this weekend, thanks for sharing!

  4. Joe Taylor said

    And C#:

    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;

    namespace SparseSet
    {
    internal sealed class SparseSet : IEnumerable
    {
    private ushort?[] D = new ushort?[ushort.MaxValue];
    private ushort?[] S = new ushort?[ushort.MaxValue];
    private ushort _size = 0;
    internal bool Insert(ushort value)
    {
    bool result = !Lookup(value);
    if (result)
    {
    D[_size] = value;
    S[value] = _size;
    _size++;
    }
    return result;
    }

    internal bool Lookup(ushort value)
    {
    bool result = S[value] < _size &&
    S[value].HasValue &&
    D[S[value].Value] == value;
    return result;
    }

    internal void Clear()
    {
    _size = 0;
    }

    public IEnumerator GetEnumerator()
    {
    var result = new SparseSetEnumerator(D, _size);
    return result;
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
    return (IEnumerator)GetEnumerator();
    }
    }
    internal sealed class SparseSetEnumerator : IEnumerator
    {
    private readonly ushort?[] _values = null;
    private readonly ushort _size = 0;
    private int _index = -1;
    internal SparseSetEnumerator(ushort?[] values, ushort size)
    {
    _values = values;
    _size = size;
    }
    public ushort Current
    {
    get
    {
    return _values[_index].Value;
    }
    }

    public void Dispose()
    {
    // squeclh
    }

    object IEnumerator.Current
    {
    get
    {
    return (object)_values[_index];
    }
    }

    public bool MoveNext()
    {
    _index++;
    bool moving = _index < _size &&
    _values[_index].HasValue;
    return moving;
    }

    public void Reset()
    {
    _index = -1;
    }
    }
    class Program
    {
    static void Main(string[] args)
    {
    var ss = new SparseSet();
    ss.Insert(5);
    ss.Insert(1);
    ss.Insert(4);
    ss.Clear();

    ss.Insert(5);
    ss.Insert(1);
    ss.Insert(4);
    ss.Clear();

    ss.Insert(3);
    ss.Insert(2);
    ss.Insert(7);
    ss.Clear();

    ss.Insert(3);
    ss.Insert(2);
    ss.Insert(7);
    ss.Insert(3);
    ss.Insert(2);
    ss.Insert(7);
    ss.Clear();

    ss.Insert(3);
    ss.Insert(2);
    ss.Insert(7);
    ss.Insert(5);
    ss.Insert(1);
    ss.Insert(4);

    foreach(ushort us in ss)
    {
    Console.WriteLine(us);
    }
    Console.WriteLine("—");
    foreach (ushort us in ss)
    {
    Console.WriteLine(us);
    }

    ss.Clear();
    Console.WriteLine("—");
    foreach (ushort us in ss)
    {
    Console.WriteLine(us);
    }
    }
    }
    }

  5. #!/usr/bin/env escript
    
    -export([main/1]).
    
    make_set(U) ->
        {0, array:new(U, {fixed, true}), array:new(U, {fixed, true})}.
    
    clear({_N, D, S}) ->
        {0, D, S}.
    
    insert({N, D, S}, El) ->
        NewD = array:set(N, El, D),
        NewS = array:set(El, N, S),
        {N + 1, NewD, NewS}.
    
    is_member({N, D, S}, El) ->
        Indx = array:get(El, S),
        (Indx < N) andalso (array:get(Indx, D) =:= El).
    
    get_d({_N, D, _S}) ->
        array:to_list(D).
    
    get_s({_N, _D, S}) ->
        array:to_list(S).
    
    for_all({N, D, _S}, Fn) when is_function(Fn) ->
        lists:map(Fn, lists:sublist(array:to_list(D), N)).
    
    main([]) ->
        Set1 = make_set(8),
        Set2 = clear(Set1),
        Set3 = insert(Set2, 5),
        Set4 = insert(Set3, 1),
        Set5 = insert(Set4, 4),
        io:format("~p~n", [is_member(Set5, 0)]),
        io:format("~p~n", [is_member(Set5, 1)]),
        io:format("~p~n", [is_member(Set5, 2)]),
        io:format("~p~n", [is_member(Set5, 3)]),
        io:format("~p~n", [is_member(Set5, 4)]),
        io:format("~p~n", [is_member(Set5, 5)]),
        io:format("~p~n", [is_member(Set5, 6)]),
        io:format("~p~n", [is_member(Set5, 7)]),
        io:format("~p~n", [get_d(Set5)]),
        io:format("~p~n", [get_s(Set5)]),
        for_all(Set5, fun(El) -> io:format(">>> ~p~n", [El]) end).
    
    false
    true
    false
    false
    true
    true
    false
    false
    [5,1,4,undefined,undefined,undefined,undefined,undefined]
    [undefined,1,undefined,undefined,2,0,undefined,undefined]
    >>> 5
    >>> 1
    >>> 4
    
  6. keenbug said

    An implementation in Guile (Scheme):

    (also here: https://gist.github.com/2011167)

    (define-module (programming-praxis sparse-sets)
      #:use-module (srfi srfi-9)
      #:export (make-sparse-set sparse-set-member? sparse-set-add! sparse-set-remove! sparse-set-for-each)
      )
    
    (define-record-type sparse-set
      (%make-sparse-set len dense sparse)
      sparse-set?
      (len sparse-set-len sparse-set-len-set!)
      (dense sparse-set-dense)
      (sparse sparse-set-sparse))
    
    
    (define (make-sparse-set universe)
      (%make-sparse-set
        0
        (make-vector universe 0)
        (make-vector universe 0)))
    
    (define (sparse-set-sparse-ref set n)
      (vector-ref (sparse-set-sparse set) n))
    
    (define (sparse-set-sparse-set! set n x)
      (vector-set! (sparse-set-sparse set)
                   n
                   x))
    
    (define (sparse-set-dense-ref set n)
      (vector-ref (sparse-set-dense set) n))
    
    (define (sparse-set-dense-set! set n x)
      (vector-set! (sparse-set-dense set)
                   n
                   x))
    
    (define (sparse-set-universe set)
      (vector-length (sparse-set-sparse set)))
    
    
    
    
    (define (sparse-set-member? set elem)
      (if (>= elem (sparse-set-universe set))
        (scm-error 'wrong-type-arg
                   'sparse-set-add!
                   "~S not in universe"
                   (list elem)
                   #f))
      (let* ((dense-pos (sparse-set-sparse-ref set elem))
             (sparse-pos (sparse-set-dense-ref set dense-pos)))
        (and (< dense-pos (sparse-set-len set))
             (= sparse-pos elem))))
    
    (define (sparse-set-add! set elem)
      (if (>= elem (sparse-set-universe set))
        (scm-error 'wrong-type-arg
                   'sparse-set-add!
                   "~S not in universe"
                   (list elem)
                   #f))
      (if (not (sparse-set-member? set elem))
          (let ((pos (sparse-set-len set)))
            (sparse-set-len-set! set (1+ pos))
            (sparse-set-dense-set! set pos elem)
            (sparse-set-sparse-set! set elem pos)))
      set)
    
    (define (sparse-set-remove! set elem)
      (if (>= elem (sparse-set-universe set))
        (scm-error 'wrong-type-arg
                   'sparse-set-add!
                   "~S not in universe"
                   (list elem)
                   #f))
      (if (sparse-set-member? set elem)
          (let ((dense-pos (sparse-set-sparse-ref set elem))
                (last-elem (sparse-set-dense-ref set (1- (sparse-set-len set)))))
            (sparse-set-dense-set! set dense-pos last-elem)
            (sparse-set-sparse-set! set last-elem dense-pos)
            (sparse-set-len-set! set (1- (sparse-set-len set)))))
      set)
    
    (define (sparse-set-for-each proc set)
      (let ((len (sparse-set-len set)))
        (do ((i 0 (1+ i)))
            ((= i len))
          (proc (sparse-set-dense-ref set i)))))
    
  7. Here is how to do this in Forth. Execute the name of the set to bring it in context.
    The INSERT, LOOKUP, ITERATE and SCLEAR words work on the set in context.

    0 VALUE #universe
    0 VALUE n
    0 VALUE Dense
    0 VALUE Sparse

    : SPARSE-SET ( sz | “name” — )
    CREATE DUP , 2* CELLS ALLOT
    DOES> @+ TO #universe TO Dense
    Dense #universe CELLS + TO Sparse ;

    #16 SPARSE-SET test-set

    test-set ( make current )

    : INSERT ( x — )
    DUP #universe >= ABORT” INSERT :: out of range”
    DUP Dense n CELL[] !
    Sparse []CELL n SWAP !
    1 +TO n ;

    : LOOKUP ( x — bool )
    >R
    Sparse R@ CELL[] @ DUP n U>= IF DROP -R FALSE EXIT ENDIF
    Dense []CELL @ R> = ;

    : ITERATE ( xt — )
    LOCAL xt
    n 0 ?DO Dense I CELL[] @ xt EXECUTE LOOP ;

    : SCLEAR ( — ) CLEAR n ;

    SCLEAR
    CR .( Insert {5,4,1} ) 5 INSERT 1 INSERT 4 INSERT
    CR .( Lookup {5,1,4,15,12} => ) 5 LOOKUP . 1 LOOKUP . 4 LOOKUP . #15 LOOKUP . #12 LOOKUP .
    CR .( Iterate with ‘.’ => ) ‘ . ITERATE

    :ABOUT CR .” Sparse Sets”
    CR .” Try: ( n — ) INSERT”
    CR .” ( n — bool ) LOOKUP ”
    CR .” ( xt — ) ITERATE ”
    CR .” ( — ) SCLEAR ” ;

    .ABOUT -sparseset CR
    DEPRIVE

  8. […] we created is called a sparse set, and allows for fairly quick acces times and single component […]

Leave a comment