String Subsets
November 23, 2010
This exercise is part of our on-going series of interview questions:
Given two strings, determine if all the characters in the second string appear in the first string; thus, DA is a subset of ABCD. Counts matter, so DAD is not a subset of ABCD, since there are two D in the second string but only one D in the first string. You may assume that the second string is no longer than the first string.
Your task is to write a function to determine if one string is a subset of another string. You should work as you would in a programming interview; if you find one solution, search for a better solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
[…] today’s Programming Praxis exercise,we have to write functions to determine if one string is a subset of […]
My Haskell solution (see http://bonsaicode.wordpress.com/2010/11/23/programming-praxis-string-subsets/ for comments and the preceding versions):
here is a commented version http://www.gleocadie.net/?p=355&lang=en#more-355
#include
#include
#include
int *freq_table(char *s)
{
int i;
int *ft = (int*)calloc(256, sizeof(int));
for (i = 0; i < strlen(s); i++)
ft[(int)s[i]]++;
return ft;
}
int is_subset(int *ft, char *s)
{
int i;
for (i = 0; i < strlen(s); i++){
int off = (int)s[i];
if (ft[off] == 0)
return -42;
ft[off]–;
}
return 42;
}
int main(int argc, char** argv)
{
int *ft;
if (argc != 3) {
printf("You should provide 2 arguments %d\n", argc);
return -1;
}
ft = freq_table(argv[1]);
if (is_subset(ft, argv[2]) == 42)
fprintf(stdout, "%s is subset of %s\n", argv[2], argv[1]);
else
fprintf(stdout, "%s is not subset of %s\n", argv[2], argv[1]);
return 0;
}
A simple ruby solution
Hmmm … lost the code somehow …
Felt like using dict comprehensions.
My PHP version
subset_of(‘DA’,’ABCD’); // TRUE
subset_of(‘DAD’,’ABCD’); // FALSE
In Python 2.7 using the Counter class:
from collections import Counter
def contains_all(a, b):
counts = Counter(a)
counts.subtract(b)
return min(counts.values()) >= 0
Python solution, with comments about expected running time to give justification of design choices:
Overly verbose yes – but quick and easy to understand I hope.
;; Use a better one next time, please.
(define (char-sort l)
(if (null? l)
l
(let ((h (car l))
(t (cdr l)))
(append
(char-sort (filter (lambda (x) (char<=? x h)) t))
(list h)
(char-sort (filter (lambda (x) (char>? x h)) t))))))
(define (string-subset? s2 s1)
(define (magic set1 set2)
(cond
((null? set1) #t)
((null? set2) #f)
((char=? (car set1) (car set2)) (magic (cdr set1) (cdr set2)))
((char<? (car set1) (car set2)) #f)
(else (magic set1 (cdr set2)))))
(and (<= (string-length s1) (string-length s2))
(magic (char-sort (string->list s1))
(char-sort (string->list s2)))))
Clever:
you code is clear, but you miss that “DA” is subset of “ABCDA”.
The guilty part is:
you would have had used:
and those tests would have passed:
Je vous remercie, Grégory! I completely glossed over that with my insufficient testing.
Some of the details definitely not necessary…just for readability. Feedback please! I’m trying to learn :D
In the following algorithm, if you get to the end of the proposed subset string, it is subset else it is not.
#include
#include
#include
int main () {
using namespace std;
using namespace boost::assign;
typedef multiset MUL_ST;
typedef multiset::iterator MS_I;
MUL_ST target;
MUL_ST subset;
target += ‘a’, ‘a’, ‘a’, ‘b’, ‘c’, ‘c’, ‘c’, ‘d’, ‘u’, ‘z’, ‘b’;
target += ‘b’, ‘b’, ‘c’, ‘d’, ‘u’, ‘z’, ‘v’;
subset += ‘b’, ‘b’, ‘c’, ‘d’, ‘u’, ‘v’, ‘v’;
cout << "\n"; BOOST_FOREACH(char c, subset) cout << c << " "; cout << "\n";
MS_I si = subset.begin(); MS_I ti = target.begin();
while ( si != subset.end() && ti != target.end() ) {
if ( *si *ti ) {
++ti;
if ( ti == target.end() ) goto judgement;
}
if ( *si != *ti ) goto judgement;
++si; ++ti;
}
judgement:
if ( si == subset.end() ) cout << "is a subset of" << "\n";
else cout << "is NOT a subset of" << "\n";
BOOST_FOREACH(char c, target) cout << c << " "; cout << "\n";
return 0;
}
Language: Clojure
(defn subset-of [s1 s2]
(every? #(<= (count (filter #{%} s1)) (count (filter #{%} s2))) s1))
In F#
Three Python solutions, similar to the ones you proposed:
I was delighted to see that I came up with the same two O(n2) and O(nlogn) solutions that you proposed… too bad I didn’t see the O(n) solution (that one was a “D’oh” moment, really) before reading the comments. Guess I won’t quit my day job yet :-)
A common lisp version…
A much shorter solution involves removing the word from the subset one letter at a time.
This works out.