Remove Duplicates From A List

December 17, 2013

One version of the function traverses the input list, keeping a list of elements as they are seen, and adding each element to the result only if it has not been seen before:

(define (rem-dups xs)
  (let loop ((xs xs) (seen (list)) (result (list)))
    (cond ((null? xs) (reverse result))
          ((member (car xs) seen)
            (loop (cdr xs) seen result))
          (else (loop (cdr xs)
                      (cons (car xs) seen)
                      (cons (car xs) result))))))

> (rem-dups '(a b a c a b a))
(a b c)

If you are worried about the linear-time cost of looking up an element in the seen list, you can use a hash table:

(define (rem-dups xs)
  (let ((seen (make-hash)))
    (let loop ((xs xs) (result (list)))
      (cond ((null? xs) (reverse result))
            ((pair? (seen 'lookup (car xs)))
              (loop (cdr xs) result))
            (else (set! seen (seen 'insert (car xs) (car xs)))
                  (loop (cdr xs) (cons (car xs) result)))))))

> (rem-dups '(a b a c a b a))
(a b c)

Wherever there is a searching solution to a problem, there is also a sorting solution to the problem. Here we decorate each input item with its integer ordinal, sort by item and ordinal, scan the list keeping only the first appearance of each item, sort again by ordinal, and finally de-decorate the list (Perl programmers call this the “Schwartzian transform” and unix hackers will recognize the “sort | unique | sort” idiom):

(define (rem-dups lt? xs)
  (map car
    (sort (lambda (a b) (< (cdr a) (cdr b)))
      (unique (lambda (a b)
                (and (not (lt? (car a) (car b)))
                     (not (lt? (car b) (car a)))))
        (sort (lambda (a b)
                (cond ((lt? (car a) (car b)) #t)
                      ((lt? (car b) (car a)) #f)
                      (else (< (cdr a) (cdr b)))))
          (map cons xs (range (length xs))))))))

We use range from the Standard Prelude, but the unique in the Standard Prelude keeps the last of each set of adjacent duplicates, rather than the first, so we have to write our own unique:

(define (unique eql? xs)
  (if (or (null? xs) (null? (cdr xs))) xs
    (let loop ((xs (cdr xs)) (zs (list (car xs))))
      (cond ((null? xs) (reverse zs))
            ((eql? (car xs) (car zs)) (loop (cdr xs) zs))
            (else (loop (cdr xs) (cons (car xs) zs)))))))

And here’s an example, complete with a function to compare symbols:

> (rem-dups (lambda (a b)
              (stringstring a)
                        (symbol->string b)))
    '(a b a c a b a))
(a b c)

You can run the program at http://programmingpraxis.codepad.org/5srWQ9rj.

About these ads

Pages: 1 2

28 Responses to “Remove Duplicates From A List”

  1. Tante Hedwig said

    A solution in Haskell

    import Data.List
    uniqify = map head . group . sort

  2. szemet said

    Haskell O(n logn) solution (library routine nub has O(n^2)). Using set to keep track of already seen elements:

    import qualified Data.Set as Set
    
    nub2 l = nub2hlp l (Set.fromList l) where
                nub2hlp [] _ = []
                nub2hlp (x:xs) remained 
                       | Set.member x remained = x : (nub2hlp xs $ Set.delete x remained)
                       | otherwise = nub2hlp xs remained
    
  3. szemet said

    It would have been better to start with empty map:

    import qualified Data.Set as Set
    
    nub2 l = nub2' l (Set.empty) where
                nub2' [] _ = []
                nub2' (x:xs) seen 
                       | Set.member x seen = nub2' xs  seen
                       | otherwise = x : (nub2' xs $ Set.insert x seen) 
    
  4. Jussi Piitulainen said

    The shell command in full garb, with GNU utils: “cat -n | sort -k 2 -s | uniq -f 1 | sort -n | cut -f 2-” aka “number the lines, stable sort ignoring the numbers, uniq ignoring the numbers, sort by the numbers, drop the numbers” (still ignoring some subtleties that may happen when sort acts smart, depending on locale settings).

    The first sort can also do the uniqification with “-u”.

  5. Paul said

    For Python a discussion on this topic can be found at code.activestate.com.

  6. zhujun said

    def RemoveDupList(l):
    cur = 0
    dellist = []
    for v in l:
    for i in xrange(cur):
    if l[i] == v:
    dellist.append(cur)
    break
    cur = cur + 1

    j = 0
    for i in dellist:
    del l[i - j]
    j = j + 1
    return l

    l = [1,2,3,4,3,5,6,1,2,4,7,3,9,4,45,7,8,6,6]

    print l
    print RemoveDupList(l)

  7. zhujun said

    Python version, post again

    def RemoveDupList(l):
    cur = 0
    dellist = []
    for v in l:
    for i in xrange(cur):
    if l[i] == v:
    dellist.append(cur)
    break
    cur = cur + 1

    j = 0
    for i in dellist:
    del l[i - j]
    j = j + 1
    return l

    l = [1,2,3,4,3,5,6,1,2,4,7,3,9,4,45,7,8,6,6]

    print l
    print RemoveDupList(l)

  8. Paul said

    @zhujun: your method works fine, but it is awfully slow compared to one of the solutions named uniq presented in the link of my last post. If the input list is range(10000) * 3, then your solutions takes more than 9.5 seconds against 17 millseconds for uniq.

  9. Mike said

    Assuming the list items are hashable, here are a few in Python:

    import collections
    import itertools as it
    
    def unique1(iterable):
    	return list(collections.OrderedDict(zip(iterable, it.count())))
    
    def unique2(iterable):
    	dd = dict(reversed(list(zip(iterable, it.count()))))
    	return sorted(dd.keys(), key=dd.__getitem__)
    
    def unique3(iterable):
    	seen = set()
    	uniq = []
    
    	for element in it.filterfalse(seen.__contains__, iterable):
    		seen.add(element)
    		uniq.append(element)
    
    	return uniq
    
  10. freddie said

    2 simple solutions in common lisp

  11. freddie said

    sorry, here they are.. ;)

    (defun rm-duplicates1 (lst)
    (defun rd-iter (lst newlist)
    (cond ((eq lst nil) newlist)
    ((eq (member (car lst) newlist) nil)
    (rd-iter (cdr lst) (append newlist (list (car lst)))))
    (t (rd-iter (cdr lst) newlist))))
    (rd-iter lst ‘()))

    (defun rm-duplicates2 (lst)
    (cond ((eq lst nil) ‘())
    ((eq (member (car lst) (cdr lst)) nil)
    (cons (car lst) (rm-duplicates2 (cdr lst))))
    (t (rm-duplicates2 (cdr lst)))))

  12. treeowl said

    Here’s another one in Haskell. I’m not too good with the fancy combinator stuff, so I’m sure there’s a slicker way, but I think I use scanl to good effect:

    import Data.Set (Set)
    import qualified Data.Set as Set

    buildsets::Ord a => [a] -> [Set a]
    buildsets = scanl (flip Set.insert) Set.empty

    nub2::Ord a => [a] -> [a]
    nub2 thelist = map fst $ filter (not . uncurry Set.member) (zip thelist (buildsets thelist))

  13. Charlie said

    A current first year comp sci student, my first post. I guess the key to this exercise is to use hashables.

    import collections
    from random import randrange

    a = [randrange(0,10000) for x in xrange(0,100000)]

    “””Two ways to remove duplicates from a list”””
    def rmv_dup(a):
    exists = set()
    e_list = []
    for i in a:
    if i in exists:
    continue
    exists.add(i)
    e_list.append(i)
    return e_list

    def rmv_dup2(a):
    dup = collections.defaultdict(list)
    dup_idx = []
    for i,e in enumerate(a):
    dup[e].append(i)
    for k,v in dup.items():
    if len(v) > 1:
    for i in v[1:]:
    dup_idx.append(i)
    b = sorted(dup_idx)
    for i in b[::-1]:
    del a[i]
    return a

  14. brooknovak said

    A simple O(n) JS solution:

    function removeDuplicates(list) {
    	var seen = {};
    	var unique = new Array();
    	for(var i = 0; i < list.length; i++) {
    		var n = list[i];
    		if(!seen[n]) {
    			unique.push(n);
    			seen[n] = 1;
    		}
    	}
    	return unique;
    }
    
  15. Python for you. Iterates through every number in the list, logging appearances in a dictionary. Appends the number to a duplicates list on the second appearance.

    import random
    random.seed(2)
    input_list = [random.randint(0, 100) for r in xrange(100)]
    
    seen = {}
    dupes = []
    for x in input_list:
        if x in seen:
            seen[x] += 1
            if seen[x] == 2:
                dupes.append(x)
        else:
            seen[x] = 1
    
    print dupes
    
  16. Seb said

    in c:

    void removes()
    {node *r;
    r=head;
    while(r!=NULL)
    {
    int m=r->val;
    node *q=r;
    node *p=r->next;
    do
    {if(p==NULL)
    break;
    if(m!=p->val)
    {q=p;
    p=p->next;
    }
    else
    {q->next=p->next;
    node *t=p;
    free(t);
    p=q->next;
    display();}

    }while(p!=NULL);
    r=r->next;
    }
    }

  17. r. clayton said

    In guile scheme.

  18. lmon said

    PHP version:

    $list = str_split(‘abddecbbafggiak’);
    $deduped = getDedupedList($list);
    print join(‘, ‘,$deduped);

    function getDedupedList($array){
    $newarray = array();
    foreach($array as $k=>$v){
    if(!in_array($v, $newarray)){
    array_push($newarray, $v);
    }
    }
    sort($newarray);
    return $newarray;
    }

  19. Sachin Raj said

    SachinRaj (Beginer Java)

    int i,j,k,rep=0,flag=0;
    Scanner sc=new Scanner(System.in);
    System.out.print(“Enter no of nos: “);
    int no=sc.nextInt();
    int[] org=new int[no];int recc[]=new int[no];
    System.out.print(“Enter nos:”);
    for(i=0;i<no;i++)org[i]=sc.nextInt();
    for(i=0;i<no;i++)
    {
    for(j=0;j<i;j++)
    {
    if(org[i]==org[j])
    rep++;
    }
    recc[i]=rep;
    rep=0;
    }
    System.out.print("Array without duplicates: ");
    for(i=0;i<no;i++)
    {
    if(recc[i]==0)System.out.print(org[i]+" ");

    }
    }

  20. F Noob said

    A one liner in scala(Folding over a tuple of Set and Empty list,returning the tuple with the current element appended to the list and the element added to the set if the current element is not in the Set, returning the tuple otherwise. Once the fold concludes, the reversed second element of the tuple is returned):

    def listwithoutdupes[T](arr:List[T]):List[T] = arr.foldLeft((Set.empty[T],List.empty[T]))((a,b) => { if(a._1.contains(b)) a else (a._1 + b, b::a._2)})._2.reverse

  21. Nathan said

    Ruby novice here.

    Trying to be clever, but I bet there is a simpler way…

    def remove_dupes(ary)
    h = {}
    ary.keep_if { |x| !(h.has_key?(x) || h[x] = false) }
    end

  22. nathan510 said

    More clear:

    def remove_dupes!(ary)
    h = {}
    ary.keep_if { |x| !( h[x] = h.key?(x) ) }
    end

  23. Juan Ignacio Saba said

    I can think of three different solutions.

    1) The first solution, and the best one I can think of right now, is O(n) and uses the following objects:
    – The input list (array)
    – The output list (array)
    – A visited list (hashtable)
    – A duplicates list (hashtable)

    In this implementation, the output list would be the sorted version of the duplicates hash, since hashes don’t ensure any particular order.

    Code here: http://pastebin.com/WMEWGfVj

    2) The second solution would be O(n*log(n)) and it would consist on sorting the input list first O(n*log(n)) and then loop through it once checking for matching consecutive elements.

  24. Juan Ignacio Saba said

    So, I said three and posted two :P.

    Further solutions are less efficient, of course. There’s an O(n^2) solution that loops the array for every item to find a dup.

  25. Ritesh said

    Use java collections, create a LinkedHashSet and pass list in set constructo

  26. Fahad said

    this very simple in perl.

    #!/usr/bin/env perl

    use strict;
    use warnings;

    my %seen;
    my @out = map {if ($seen{$_}) {()} else {$seen{$_} = 1;$_}} @ARGV;

    print join(‘ ‘, @out), “\n”;

  27. Fahad said

    probably better as:

    my @out = map {unless ($seen{$_}) {$seen{$_} = 1;$_}} @ARGV;

  28. Juan Ignacio Saba said

    Hi Fahad. I think you misunderstood the problem. The original objective is to output the list of duplicates, whereas your code outputs the list “without” duplicates.

    Now, just for fun, if the objective was to output the list without dups, then your first piece of code would be correct but the second one has a bug. The “unless” statement will return 1 (true) or 0 (false) for every item in the input list, so for every item that is a dup the unless statement will let a “1” pass through the map and into the output.

    Cheers!
    Juan Ignacio

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 621 other followers

%d bloggers like this: