Duplicate Items In An Array
June 14, 2016
Today’s exercise is in two parts, first a commonly-seen programming exercise and then a variant on it; the origin of the exercise is certainly someone’s homework, but since school is out for the year it doesn’t matter that we do the exercise today.
First, write a program that, given an array of integers in unsorted order, finds the single duplicate number in the array. For instance, given the input [1,2,3,1,4], the correct output is 4.
Second, write a program that, given an array of integers in unsorted order, finds all of the multiple duplicate numbers in the array. For instance, given the input [1,2,3,1,2,4,1], the correct output is [1,2,1].
Your task is to write the two programs that find duplicates in an array. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Pretty much 1-liners for both of these in Perl – using core perl (+List::Util)
Part 1 – find (only) first entry…
use feature qw(say); use List::Util qw(first); say first { exists $X{$_} || !($X{$_}=1) } @ARGV;Part 1 – find all entries
[sourceode lang=”perl”]
use feature qw(say);
say join q( ), grep { exists $X{$_} || !($X{$_}=1) } @ARGV;
[/sourcecode]
If the numbers are bounded and small, you can use a bit vector, which takes less actual space than a hash table.
Here’s a solution in C. It uses GLib for its hash table.
#include <stdio.h> #include <stdlib.h> #include <glib.h> // return code indicates if duplicate was found int FirstDuplicate(int arr[], size_t n, int* duplicate) { GHashTable* hash = g_hash_table_new(g_int_hash, g_int_equal); for (size_t i = 0; i < n; i++) { if (g_hash_table_contains(hash, arr + i)) { *duplicate = arr[i]; return 1; } else { g_hash_table_insert(hash, arr + i, NULL); } } return 0; } void Duplicates(int arr[], size_t n, int* duplicates, size_t* duplicates_n) { GHashTable* hash = g_hash_table_new(g_int_hash, g_int_equal); for (size_t i = 0; i < n; i++) { if (g_hash_table_contains(hash, arr + i)) { duplicates[(*duplicates_n)++] = arr[i]; } else { g_hash_table_insert(hash, arr + i, NULL); } } } int main(int argc, char* argv[]) { int a[] = {1,2,3,1,4}; size_t a_n = sizeof(a)/sizeof(a[0]); int duplicate = 0; if (FirstDuplicate(a, a_n, &duplicate)) { printf("duplicate: %d", duplicate); } else { printf("no duplicate"); } printf("\n\n"); int b[] = {1,2,3,1,2,4,1}; size_t b_n = sizeof(b)/sizeof(b[0]); int* duplicates = malloc(sizeof(int) * (b_n-1)); size_t duplicates_n = 0; Duplicates(b, b_n, duplicates, &duplicates_n); if (duplicates_n > 0) { printf("duplicates: ["); for (size_t i = 0; i < duplicates_n; i++) { if (i > 0) { printf(","); } printf("%d", duplicates[i]); } printf("]"); } else { puts("no duplicates"); } printf("\n"); free(duplicates); return 0; }Here’s the output:
The problem says “the correct output is 4”. I should seemingly say 1 (there’s only one 4, but there are two 1s).
import static java.util.Arrays.stream; import java.util.Arrays; import java.util.HashSet; import java.util.OptionalInt; import java.util.function.IntPredicate; public class Duplicates { public static OptionalInt firstDuplicate(int... ints) { return stream(ints) .filter(((IntPredicate) new HashSet<>()::add).negate()) .findFirst(); } public static int[] duplicates(int... ints) { return stream(ints) .filter(((IntPredicate) new HashSet<>()::add).negate()) .toArray(); } public static void main(String[] args) { System.out.println(firstDuplicate(1, 2, 3, 1, 4).getAsInt()); // 1 System.out.println(Arrays.toString(duplicates(1, 2, 3, 1, 2, 4, 1))); // [1, 2, 1] } }A Haskell version, favouring conciseness over efficiency.
import Data.List -- Return Just x if and only if x is the only duplicate element. singleDup :: Ord a => [a] -> Maybe a singleDup xs = case filter ((>1) . length) . group . sort $ xs of [x:_] -> Just x _ -> Nothing -- Return the duplicate elements of the list in the order they appear. The -- first occurrence is omitted. dups :: Eq a => [a] -> [a] dups = concat . snd . mapAccumL step [] where step ds x = if x `elem` ds then (ds, [x]) else (x:ds, []) main :: IO () main = do print $ singleDup [1, 2, 3, 1, 4 :: Int] print $ dups [1, 2, 3, 1, 2, 4, 1 :: Int]Incidentally, the first part has been an exercise on here before (albeit with an upper bound on the array elements) way back in 2011. https://programmingpraxis.com/2011/09/23/array-duplicates/
A good problem