## Remove Duplicates From A List

### December 17, 2013

We have today another exercise from our infinite supply of interview questions:

Return a list containing only one copy of any duplicates in an input list, with items in the output in the same order as their first appearance in the input.

Your task is to answer the interview question given above; you must provide at least two different solutions. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

### 28 Responses to “Remove Duplicates From A List”

1. Tante Hedwig said

import Data.List
uniqify = map head . group . sort

2. szemet said

Haskell O(n logn) solution (library routine nub has O(n^2)). Using set to keep track of already seen elements:

```import qualified Data.Set as Set

nub2 l = nub2hlp l (Set.fromList l) where
nub2hlp [] _ = []
nub2hlp (x:xs) remained
| Set.member x remained = x : (nub2hlp xs \$ Set.delete x remained)
| otherwise = nub2hlp xs remained
```
3. szemet said

```import qualified Data.Set as Set

nub2 l = nub2' l (Set.empty) where
nub2' [] _ = []
nub2' (x:xs) seen
| Set.member x seen = nub2' xs  seen
| otherwise = x : (nub2' xs \$ Set.insert x seen)
```
4. Jussi Piitulainen said

The shell command in full garb, with GNU utils: “cat -n | sort -k 2 -s | uniq -f 1 | sort -n | cut -f 2-” aka “number the lines, stable sort ignoring the numbers, uniq ignoring the numbers, sort by the numbers, drop the numbers” (still ignoring some subtleties that may happen when sort acts smart, depending on locale settings).

The first sort can also do the uniqification with “-u”.

5. Paul said

For Python a discussion on this topic can be found at code.activestate.com.

6. zhujun said

def RemoveDupList(l):
cur = 0
dellist = []
for v in l:
for i in xrange(cur):
if l[i] == v:
dellist.append(cur)
break
cur = cur + 1

j = 0
for i in dellist:
del l[i – j]
j = j + 1
return l

l = [1,2,3,4,3,5,6,1,2,4,7,3,9,4,45,7,8,6,6]

print l
print RemoveDupList(l)

7. zhujun said

Python version, post again

def RemoveDupList(l):
cur = 0
dellist = []
for v in l:
for i in xrange(cur):
if l[i] == v:
dellist.append(cur)
break
cur = cur + 1

j = 0
for i in dellist:
del l[i – j]
j = j + 1
return l

l = [1,2,3,4,3,5,6,1,2,4,7,3,9,4,45,7,8,6,6]

print l
print RemoveDupList(l)

8. Paul said

@zhujun: your method works fine, but it is awfully slow compared to one of the solutions named uniq presented in the link of my last post. If the input list is range(10000) * 3, then your solutions takes more than 9.5 seconds against 17 millseconds for uniq.

9. Mike said

Assuming the list items are hashable, here are a few in Python:

```import collections
import itertools as it

def unique1(iterable):
return list(collections.OrderedDict(zip(iterable, it.count())))

def unique2(iterable):
dd = dict(reversed(list(zip(iterable, it.count()))))
return sorted(dd.keys(), key=dd.__getitem__)

def unique3(iterable):
seen = set()
uniq = []

for element in it.filterfalse(seen.__contains__, iterable):
uniq.append(element)

return uniq
```
10. freddie said

2 simple solutions in common lisp

11. freddie said

sorry, here they are.. ;)

(defun rm-duplicates1 (lst)
(defun rd-iter (lst newlist)
(cond ((eq lst nil) newlist)
((eq (member (car lst) newlist) nil)
(rd-iter (cdr lst) (append newlist (list (car lst)))))
(t (rd-iter (cdr lst) newlist))))
(rd-iter lst ‘()))

(defun rm-duplicates2 (lst)
(cond ((eq lst nil) ‘())
((eq (member (car lst) (cdr lst)) nil)
(cons (car lst) (rm-duplicates2 (cdr lst))))
(t (rm-duplicates2 (cdr lst)))))

12. treeowl said

Here’s another one in Haskell. I’m not too good with the fancy combinator stuff, so I’m sure there’s a slicker way, but I think I use scanl to good effect:

import Data.Set (Set)
import qualified Data.Set as Set

buildsets::Ord a => [a] -> [Set a]
buildsets = scanl (flip Set.insert) Set.empty

nub2::Ord a => [a] -> [a]
nub2 thelist = map fst \$ filter (not . uncurry Set.member) (zip thelist (buildsets thelist))

13. Charlie said

A current first year comp sci student, my first post. I guess the key to this exercise is to use hashables.

import collections
from random import randrange

a = [randrange(0,10000) for x in xrange(0,100000)]

“””Two ways to remove duplicates from a list”””
def rmv_dup(a):
exists = set()
e_list = []
for i in a:
if i in exists:
continue
e_list.append(i)
return e_list

def rmv_dup2(a):
dup = collections.defaultdict(list)
dup_idx = []
for i,e in enumerate(a):
dup[e].append(i)
for k,v in dup.items():
if len(v) > 1:
for i in v[1:]:
dup_idx.append(i)
b = sorted(dup_idx)
for i in b[::-1]:
del a[i]
return a

14. brooknovak said

A simple O(n) JS solution:

```function removeDuplicates(list) {
var seen = {};
var unique = new Array();
for(var i = 0; i < list.length; i++) {
var n = list[i];
if(!seen[n]) {
unique.push(n);
seen[n] = 1;
}
}
return unique;
}
```
15. Python for you. Iterates through every number in the list, logging appearances in a dictionary. Appends the number to a duplicates list on the second appearance.

```import random
random.seed(2)
input_list = [random.randint(0, 100) for r in xrange(100)]

seen = {}
dupes = []
for x in input_list:
if x in seen:
seen[x] += 1
if seen[x] == 2:
dupes.append(x)
else:
seen[x] = 1

print dupes
```
16. Seb said

in c:

void removes()
{node *r;
while(r!=NULL)
{
int m=r->val;
node *q=r;
node *p=r->next;
do
{if(p==NULL)
break;
if(m!=p->val)
{q=p;
p=p->next;
}
else
{q->next=p->next;
node *t=p;
free(t);
p=q->next;
display();}

}while(p!=NULL);
r=r->next;
}
}

17. r. clayton said

In guile scheme.

18. lmon said

PHP version:

\$list = str_split(‘abddecbbafggiak’);
\$deduped = getDedupedList(\$list);
print join(‘, ‘,\$deduped);

function getDedupedList(\$array){
\$newarray = array();
foreach(\$array as \$k=>\$v){
if(!in_array(\$v, \$newarray)){
array_push(\$newarray, \$v);
}
}
sort(\$newarray);
return \$newarray;
}

19. Sachin Raj said

SachinRaj (Beginer Java)

int i,j,k,rep=0,flag=0;
Scanner sc=new Scanner(System.in);
System.out.print(“Enter no of nos: “);
int no=sc.nextInt();
int[] org=new int[no];int recc[]=new int[no];
System.out.print(“Enter nos:”);
for(i=0;i<no;i++)org[i]=sc.nextInt();
for(i=0;i<no;i++)
{
for(j=0;j<i;j++)
{
if(org[i]==org[j])
rep++;
}
recc[i]=rep;
rep=0;
}
System.out.print("Array without duplicates: ");
for(i=0;i<no;i++)
{
if(recc[i]==0)System.out.print(org[i]+" ");

}
}

20. F Noob said

A one liner in scala(Folding over a tuple of Set and Empty list,returning the tuple with the current element appended to the list and the element added to the set if the current element is not in the Set, returning the tuple otherwise. Once the fold concludes, the reversed second element of the tuple is returned):

def listwithoutdupes[T](arr:List[T]):List[T] = arr.foldLeft((Set.empty[T],List.empty[T]))((a,b) => { if(a._1.contains(b)) a else (a._1 + b, b::a._2)})._2.reverse

21. Nathan said

Ruby novice here.

Trying to be clever, but I bet there is a simpler way…

def remove_dupes(ary)
h = {}
ary.keep_if { |x| !(h.has_key?(x) || h[x] = false) }
end

22. nathan510 said

More clear:

def remove_dupes!(ary)
h = {}
ary.keep_if { |x| !( h[x] = h.key?(x) ) }
end

23. Juan Ignacio Saba said

I can think of three different solutions.

1) The first solution, and the best one I can think of right now, is O(n) and uses the following objects:
– The input list (array)
– The output list (array)
– A visited list (hashtable)
– A duplicates list (hashtable)

In this implementation, the output list would be the sorted version of the duplicates hash, since hashes don’t ensure any particular order.

Code here: http://pastebin.com/WMEWGfVj

2) The second solution would be O(n*log(n)) and it would consist on sorting the input list first O(n*log(n)) and then loop through it once checking for matching consecutive elements.

24. Juan Ignacio Saba said

So, I said three and posted two :P.

Further solutions are less efficient, of course. There’s an O(n^2) solution that loops the array for every item to find a dup.

25. Ritesh said

Use java collections, create a LinkedHashSet and pass list in set constructo

this very simple in perl.

#!/usr/bin/env perl

use strict;
use warnings;

my %seen;
my @out = map {if (\$seen{\$_}) {()} else {\$seen{\$_} = 1;\$_}} @ARGV;

print join(‘ ‘, @out), “\n”;