Word Sets
March 17, 2017
Today’s exercise is an interview question;
Given a list of words and a set of characters, determine which words in the list can be formed from the given characters. For instance, given the characters a, c and t, the words act and cat can be formed, but not the word stop.
Your task is to write the word classified described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Something (not so memory efficient) in Python3.
Outputs [‘stew’, ‘west’, ‘wets’, ‘tews’].
Added bonus: learned a new word tew :)
A Haskell version. Letters can be used more than once. It’s case-sensitive.
Here’s the output of running it with an English dictionary and the Finnish translation of Homer’s Odyssey.
[…] completed the solution to the previous exercise, which you can look at if you wish. We’ll have a new exercise on […]
Here’s some C++ that uses a binary trie to store the character sets of the words given on standard input, represented as bitsets. Given a word and the derived bitset, we can efficiently find all words that can be made up with those characters:
Of course, this works just as well:
C# Implementation:
static List GetMatchingWordsSet(List wordsSet, List charSet)
{
List lstMatchingWordsSet = new List();
foreach (var word in wordsSet)
{
int wordFlag = 0;
int wordlength = word.Length;
for (int i = 0; i < wordlength; i++)
{
if (charSet.Contains(Convert.ToChar(word[i].ToString().ToLower())))
{
wordFlag++;
}
}
if (wordFlag == word.Length)
{
lstMatchingWordsSet.Add(word);
}
wordFlag = 0;
}
return lstMatchingWordsSet;
}
static void Main(string[] args)
{
List lstWords = new List { “One”, “Two”, “Three”, “Four”, “Five”, “Six”, “Cat”,”Tin” };
List lstChars = new List { ‘a’, ‘e’, ‘i’, ‘o’, ‘u’, ‘n’, ‘s’, ‘x’, ‘t’ };
var MatchingWordsSet = GetMatchingWordsSet(lstWords, lstChars);
Console.WriteLine(“Matching Words:”);
foreach (var w in MatchingWordsSet)
{
Console.WriteLine(w);
}
}
To avoid duplicates the following changes have to be done in my code (my above comment):
string distinctWord = new String(word.ToLower().Distinct().ToArray());
if (wordFlag == word.Length && distinctWord.Length == word.Length)
{
lstMatchingWordsSet.Add(word);
}
Mumps Implementation:
MCL> type fword
fword (word,file) ; Find matching words in file
;
n char,fword,i,newword,str
i $g(word)=”” w !,”No word was supplied” q
i $g(file)=”” w !,*7,”No file was included” q
o 1:(file:”R”):0
i ‘$t w !,*7,”Unable to open “,file q
w !!,”file: “,file
w !,”word: “,word
u 1
s newword=””,str=””
f i=1:1:$l(word) s char=$e(word,i) s:newword'[char newword=newword_char
f i=1:1 r fword q:fword=”” d
. i $$matches(newword,fword) s str=str_fword_”,”
c 1
w !,”Matching words: “,$s($l(str):$e(str,1,$l(str)-1),1:”None in “_file
_)
q
;
matches (word,fword); See if fword can be built with chars in word
;
n flg,i
s flg=1
f i=1:1:$l(fword) d q:’flg
. s:word'[$e(fword,i) flg=0
q flg
—
MCL> d ^fword(“praxis”,”/usr/share/dict/words”)
file: /usr/share/dict/words
word: praxis
Matching words: a,air,airs,aria,arias,as,asp,asps,ass,ax,axis,i,iris,is,p,pa,pair,pairs,pap,papa,papas,paps,par,pars,pas,pass,pi,pip,pips,pis,piss,r,rap,raps,rasp,rasps,rip,rips,rs,s,sap,saps,sari,saris,sass,sax,sip,sips,sir,sirs,sis,six,spa,spar,spars,spas,x
—
MCL> d ^fword(“^~”,”/usr/share/dict/words”)
file: /usr/share/dict/words
word: ^~
Matching words: None in /usr/share/dict/words
Mumps Implementation:
w !,”Matching words: “,$s($l(str):$e(str,1,$l(str)-1),1:”None in “_file
_)
should be:
w !,”Matching words: “,$s($l(str):$e(str,1,$l(str)-1),1:”None in “_file)
@bookofstevegraham: We don’t often see mumps here. Perhaps you could give a brief explanation of your program, for those unfamiliar with mumps.
I’ve been asked to give a brief explanation on my implementation. Here goes. I will be including the complete name of commands and functions, instead of the abbreviations.
Mumps Implementation: (https://en.wikipedia.org/wiki/MUMPS)
MCL> type fword
fword (word,file) ; Find matching words in file # Program name, accepting 2 arguments
;
new char,fword,i,newword,str # Create new versions of variables
if $get(word)=”” write !,”No word was supplied” quit # Quit program if the word was not supplied, or it was = “”
if $get(file)=”” w !,*7,”No file was included” quit # Quit program if the filename was not supplied, or it was = “”
open 1:(file:”R”):0 # Open device 1 in read mode and assign the file to it
if ‘$test write !,*7,”Unable to open “,file quit # When you (try to) open a device with a trailing “:0”, and it is successful, $test is set to 1. If unable it is set to 0.
# If 0, then quit program
write !!,”file: “,file # Write filename with 2 lf/cr (!)
write !,”word: “,word # Write supplied word with lf/cr
use 1 # Begin using the device/file for input
set newword=””,str=”” # Initialize variables
for i=1:1:$length(word) set char=$extract(word,i) set:newword'[char newword=newword_char # Take each unique character in the supplied word and put it in variable newword
# Mumps allows post-conditionals for most commands. set:newword'[char = if newword does not contain char, set …
# [ is the operator for contains and ‘ is the operator for negate/not
for i=1:1 read fword quit:fword=”” do # For each word in file
. if $$matches(newword,fword) set str=str_fword_”,” # If the file word matches the characters in the supplied word, append it to a comma-delimited list of matched words
# $$matches indicates a user-created function named matches. _ is the operator for concatenation.
close 1 # Close device/file and revert to std in/out
write !,”Matching words: “,$select($length(str):$extract(str,1,$length(str)-1),1:”None in “_file) # Write out list of matched words with lf/cr or a message that there were no matches
quit # End program
;
matches(word,fword); See if fword can be built with chars in word # Subroutine for matching supplied word with list word
;
new flg,i # Create new versions of variables
set flg=1 # Initialize flg variable (Assuming the supplied word matches list word)
for i=1:1:$l(fword) do quit:’flg # For each character in list word, do next line, quit loop if flg = 0
. set:word'[$e(fword,i) flg=0 # If supplied word does not contain this character in list word, set flg = 0. This will cause the loop to quit upon returning to it
quit flg # Quit subroutine returning value of flg (1 = it matched, 0 = it did not match)
—
MCL> d ^fword(“praxis”,”/usr/share/dict/words”)
file: /usr/share/dict/words
word: praxis
Matching words: a,air,airs,aria,arias,as,asp,asps,ass,ax,axis,i,iris,is,p,pa,pair,pairs,pap,papa,papas,paps,par,pars,pas,pass,pi,pip,pips,pis,piss,r,rap,raps,rasp,rasps,rip,rips,rs,s,sap,saps,sari,saris,sass,sax,sip,sips,sir,sirs,sis,six,spa,spar,spars,spas,x
—
MCL> d ^fword(“^~”,”/usr/share/dict/words”)
file: /usr/share/dict/words
word: ^~
Matching words: None in /usr/share/dict/words
@Bookofstevegraham: Fascinating language. Thank you!
You’re welcome. Thanks for the interest.
Looking at the solutions so far it looks like there are several interpretations of the problem: (a) look for words that contain exactly the letters in the set (i.e. if there’s only one ‘s’, then
words can only contain one ‘s’), (b) look for words that only use letters from the set with no repeats, but not necessarily all of them, and (c) words that only letters from the set, but with repeats allowed.
The following Unicon [works in Icon as well] solution assumes (a):
————————————————————-
procedure main(a)
w := a[1] | “gomiz”
every write(matched(w, !&input))
end
procedure matched(w, pword)
every ((*pword = *w), (t := pword), t[bound(upto,!w,t)] := “”)
return (0 = *\t, pword)
end
procedure bound(a[])
return a[1]!a[2:0]
end
————————————————————-
Sample run:
———————————————————-
->anagram west
————————————————————
Hmmm, sample run didn’t come through for Unicon solution:
Steve Wampler: Could you comment on how your solution works? I would guess that most are not familiar with the language.
an icon program meeting sub-challenge (c). 8 lines of code
procedure main(a)
# given a set of characters in parameter 1 and a wordlist as a file as parameter 2
# respond with the words in the wordlist that meet the criteria
# solution type (c)
#
# what criteria?
#
# The letters in parameter 1 may be used in the target word
# 1 or more times or not at all. in any order.
# for simplicity,
# I have made no special provision for punctuation
# so backslash and quotes, for example may behave unexpectedly.
#
# This is partly also because different shells will have different behaviour.
# backslash escaping may help some situations.
#
###############################
# get the characters
thechars := cset(a[1]) | stop(“No characters found; the word \”\” is made of no characters”)
#get the wordlist
put(a,”C:\\Users\\…\\webster\\WORDS.TXT”)
# local fix puts a dictionary in position3 if user supplied a dictionary, else at position 2
thewords := open(a[2],r) | stop(“No words to test. Is the file valid?”)
#########################
every aword := !thewords do {
charsofaword := cset(aword)
# thediff := charsofaword — thechars
# cset difference, any letters in diff are those in the word, but not in the
# characters supplied by the user as parameter 1.
# if (*() = 0) then write(aword)
if (*(charsofaword — thechars) = 0) then write(aword)
}
end
run using “cat” on a 20C websters list I had.
a
aa
acca
act
acta
at
atta
c
ca
cat
t
ta
taa
tact
tat
tatta
Sorry, non-fatal error; “thewords := open(a[2],r)” should read
“thewords := open(a[2],’rt’)”
The variable r is initially null, so the 2nd parameter to “open” defaults to ‘rt’. The compiled program executes the same.
I was asked to provide a commented version of my Unicon solution, since both Unicon and Icon might not be familiar to some folks.
A little background – Unicon is a successor to Icon. In both languages expressions are capable of generating sequences of results.
A result sequence might have anywhere from 0 to an infinity of results. Normally expressions only so far as needed to produce a single
result. If no result is produced or if there are no more results that can be produced then expression evaluation is said to fail, otherwise
the evaluation succeeds. If an expression fails, subexpressions that can produce further results are automatically back-tracked into
to produce their next result in an attempt to produce successful evaluation of the entire expression. This process is called goal-directed evaluation.
The every control structure essentially tells a successful evaluation of its expression that it really failed – forcing goal-directed evaluation
to exhaust all possibilities.
Control flow is governed by this success or failure of the evaluation, not by the values computed during evaluation.
(I apologize in advance for the length of this explanation. Unicon and Icon behave differently from most common procedural programming languages
even though they don’t look all that different.)
Implemented in the klong programming language (http://t3x.org/klong/), an array language patterned after the k programming language.
$ kg ./wordsets4.kg mississippi < /usr/share/dict/words
[i imp imps is ism isms m mi miss ms p pi pimp pimps pip pips pis piss s sip sips sis]
Oops, forgot the program. Try again.
$ cat wordsets5.kg
main::{[a fword list word]; word::?.a@0; list::[]; .mi{fword::x; :[~#({word?x}’fword)?[]; list::list,fword; a::””]; .rl()}:~.rl(); .p(list)}
main()
$ kg ./wordsets5.kg mississippi < /usr/share/dict/words
[i imp imps is ism isms m mi miss ms p pi pimp pimps pip pips pis piss s sip sips sis]
$