Word Sets

March 17, 2017

[ I was visiting my daughter in Houston this week, and unbeknownst to me her computer was fried by lightning a few weeks ago; she was waiting for me to visit so I could advise her what to do. I’m typing this exercise on my tablet, where typing lots of words is difficult and typing code is nearly impossible. I’ll add code when I get back home in a few days. ]

There is an ambiguity in the problem definition, which is probably purposeful on the part of the interviewer. The word tact can be formed from the three letters a, c and t only if duplicates are permitted. In that case, the simple algorithm is to insert the characters in a set, then reject each word in the list that has a letter not in the set:

(define (alpha-set cs)
  (let ((s (make-vector 26 #f)))
    (do ((cs cs (cdr cs)))
        ((null? cs) s)
      (let ((t (- (char->integer (car cs)) 97)))
        (vector-set! s t #t))))

(define (word-set s word)
  (let loop ((cs (string->list word)))
    (if (null? cs) #t
      (let ((t (- (char->integer (car cs)) 97)))
        (if (vector-ref s t) (loop (cdr cs)) #f)))))

> (define s (alpha-set '(#\a #\c #\t)))
> (word-set s "act")
#t
> (word-set s "tact")
#t
> (word-set s "stop")
#f

Here, the set of input characters is stored in the alpha vector; it could be stored in other ways — a hash table, a binary tree, something else — if desired. We are assuming all the letters are lower case.

If duplicates aren’t allowed, you have to count. Our solution keeps an array of length 26, and initializes each array slot with the number of available copies of the corresponding letter. Then each word is examined, and the count of each letter in the word is reduced each time it appears. If any count is ever negative, the word is rejected:

(define (alpha-count cs)
  (let ((c (make-vector 26 0)))
    (do ((cs cs (cdr cs)))
        ((null? cs) c)
      (let ((t (- (char->integer (car cs)) 97)))
        (vector-set! c t (+ (vector-ref c t) 1))))))

(define (word-count c word)
  (let ((c (vector-copy c)))
    (let loop ((cs (string->list word)))
      (if (null? cs) #t
        (let ((t (- (char->integer (car cs)) 97)))
          (vector-set! c t (- (vector-ref c t) 1))
          (if (negative? (vector-ref c t)) #f
            (loop (cdr cs))))))))

> (define c (alpha-count '(#\a #\c #\t)))
> (word-count c "act")
#t
> (word-count c "tact")
#f
> (word-count c "stop")
#f

Here the alpha vector stores letter counts instead of booleans. The letter counts are updated after each input character, so the vector must be copied afresh for each word that is tested.

You can run the program at http://ideone.com/mQoc3j.
.

Posted by programmingpraxis

Filed in Exercises

20 Comments »

20 Responses to “Word Sets”

Rutger said

March 17, 2017 at 3:12 PM

Something (not so memory efficient) in Python3.

from collections import defaultdict

def load_word_dict(words_file):
	word_dict = defaultdict(list)
	for line in open(words_file):
		word = line.strip()
		key = "".join(sorted(word))
		word_dict[key].append(word)
	return word_dict

words_file = '/usr/share/dict/words'
letters = "".join(sorted('west'))
wd = load_word_dict(words_file)
print(wd[letters])

Outputs [‘stew’, ‘west’, ‘wets’, ‘tews’].
Added bonus: learned a new word tew :)

Globules said

March 18, 2017 at 3:49 AM

A Haskell version. Letters can be used more than once. It’s case-sensitive.

import Data.List.Ordered (nubSort)
import System.Environment (getArgs)

-- Return true iff the word is composed entirely from the letters.  All letters
-- must be used, but they can be used more than once.
madeFrom :: String -> String -> Bool
word `madeFrom` letters = nubSort word == letters

main :: IO ()
main = do
  letters <- fmap (nubSort . concat) getArgs
  interact (unlines . filter (`madeFrom` letters)  . words)

Here’s the output of running it with an English dictionary and the Finnish translation of Homer’s Odyssey.

$ ./wordsets astringently < /usr/share/dict/words
arrestingly
astringently
intransigently
restrainingly
staggeringly
transgressingly
$ ./wordsets sisältyvää < ~/Downloads/pg52108.txt
sisältyvää
liittyvistä
ystävällisistä
ystävällisiä
ystävällisistä
$

Word Sets, Solved | Programming Praxis said
March 21, 2017 at 9:01 AM
[…] completed the solution to the previous exercise, which you can look at if you wish. We’ll have a new exercise on […]

matthew said

March 22, 2017 at 9:55 PM

Here’s some C++ that uses a binary trie to store the character sets of the words given on standard input, represented as bitsets. Given a word and the derived bitset, we can efficiently find all words that can be made up with those characters:

#include <map>
#include <vector>
#include <string>
#include <iostream>
#include <stdint.h>

typedef uint32_t T;

struct Node {
  T s; Node *zero; Node *one;
};

// t is s shifted to just contain the uninserted bits
void insert(T s, T t, Node *&p) {
  if (!p) p = new Node();
  if (t == 0) p->s = s; // No more bits, we are done
  else if (t&1) insert(s,t>>1,p->one);
  else insert(s,t>>1,p->zero);
}

void find(T s, T t, const Node *p, std::vector<T> &a) {
  if (!p) return;
  if (p->s) a.push_back(p->s); // Found something!
  if (t == 0) return;          // No more bits
  find(s,t>>1,p->zero,a);
  if (t&1) find(s,t>>1,p->one,a); // 
}

// Make a bitset from the characters of a
T mkset(const std::string &a) {
  T s = 0;
  for (int i = 0; i < (int)a.size(); i++) {
    int c = a[i];
    if (c < 'a' || c > 'z') return 0;
    s |= (1 << (c-'a'));
  }
  return s;
}

int main(int argc, char *argv[]) {
  Node *root = 0;
  std::map<T,std::vector<std::string>> words;
  std::string word;
  while(std::cin >> word) {
    auto s = mkset(word);
    if (s) {
      words[s].push_back(word);
      insert(s,s,root);
    }
  }
  for (int i = 1; i < argc; i++) {
    std::vector<T> a; // Result goes here
    auto s = mkset(argv[i]);
    find(s,s,root,a);
    for (auto i = a.begin(); i != a.end(); i++) {
      auto t = words[*i];
      for (auto j = t.begin(); j != t.end(); j++) {
        std::cout << *j << "\n";
      }
    }
  }
}

$ ./a.out praxis < /usr/share/dict/words | sort | fmt
a air airs aria arias as asp asps ass ax axis i iris is p pa pair pairs
pap papa papas paps par pars pas pass pi pip pips pis piss r rap raps
rasp rasps rip rips rs s sap saps sari saris sass sax sip sips sir sirs
sis six spa spar spars spas x

Of course, this works just as well:

$ egrep '^[praxis]*$' /usr/share/dict/words | sort | fmt
a air airs aria arias as asp asps ass ax axis i iris is p pa pair pairs
pap papa papas paps par pars pas pass pi pip pips pis piss r rap raps
rasp rasps rip rips rs s sap saps sari saris sass sax sip sips sir sirs
sis six spa spar spars spas x

Jebastin J said
March 23, 2017 at 6:40 AM
C# Implementation:

static List GetMatchingWordsSet(List wordsSet, List charSet)
{
List lstMatchingWordsSet = new List();

foreach (var word in wordsSet)
{
int wordFlag = 0;
int wordlength = word.Length;
for (int i = 0; i < wordlength; i++)
{
if (charSet.Contains(Convert.ToChar(word[i].ToString().ToLower())))
{
wordFlag++;
}
}
if (wordFlag == word.Length)
{
lstMatchingWordsSet.Add(word);
}
wordFlag = 0;
}

return lstMatchingWordsSet;
}

static void Main(string[] args)
{
List lstWords = new List { “One”, “Two”, “Three”, “Four”, “Five”, “Six”, “Cat”,”Tin” };
List lstChars = new List { ‘a’, ‘e’, ‘i’, ‘o’, ‘u’, ‘n’, ‘s’, ‘x’, ‘t’ };
var MatchingWordsSet = GetMatchingWordsSet(lstWords, lstChars);
Console.WriteLine(“Matching Words:”);
foreach (var w in MatchingWordsSet)
{
Console.WriteLine(w);
}
}
Jebastin J said
March 23, 2017 at 7:10 AM
To avoid duplicates the following changes have to be done in my code (my above comment):

string distinctWord = new String(word.ToLower().Distinct().ToArray());
if (wordFlag == word.Length && distinctWord.Length == word.Length)
{
lstMatchingWordsSet.Add(word);
}
bookofstevegraham said
March 23, 2017 at 10:01 PM
Mumps Implementation:

MCL> type fword
fword (word,file) ; Find matching words in file
;
n char,fword,i,newword,str
i $g(word)=”” w !,”No word was supplied” q
i $g(file)=”” w !,*7,”No file was included” q
o 1:(file:”R”):0
i ‘$t w !,*7,”Unable to open “,file q
w !!,”file: “,file
w !,”word: “,word
u 1
s newword=””,str=””
f i=1:1:$l(word) s char=$e(word,i) s:newword'[char newword=newword_char
f i=1:1 r fword q:fword=”” d
. i $$matches(newword,fword) s str=str_fword_”,”
c 1
w !,”Matching words: “,$s($l(str):$e(str,1,$l(str)-1),1:”None in “_file
_)
q
;
matches (word,fword); See if fword can be built with chars in word
;
n flg,i
s flg=1
f i=1:1:$l(fword) d q:’flg
. s:word'[$e(fword,i) flg=0
q flg

—

MCL> d ^fword(“praxis”,”/usr/share/dict/words”)

file: /usr/share/dict/words
word: praxis
Matching words: a,air,airs,aria,arias,as,asp,asps,ass,ax,axis,i,iris,is,p,pa,pair,pairs,pap,papa,papas,paps,par,pars,pas,pass,pi,pip,pips,pis,piss,r,rap,raps,rasp,rasps,rip,rips,rs,s,sap,saps,sari,saris,sass,sax,sip,sips,sir,sirs,sis,six,spa,spar,spars,spas,x

—

MCL> d ^fword(“^~”,”/usr/share/dict/words”)

file: /usr/share/dict/words
word: ^~
Matching words: None in /usr/share/dict/words
bookofstevegraham said
March 23, 2017 at 10:03 PM
Mumps Implementation:

w !,”Matching words: “,$s($l(str):$e(str,1,$l(str)-1),1:”None in “_file
_)

should be:

w !,”Matching words: “,$s($l(str):$e(str,1,$l(str)-1),1:”None in “_file)
programmingpraxis said
March 24, 2017 at 12:05 AM
@bookofstevegraham: We don’t often see mumps here. Perhaps you could give a brief explanation of your program, for those unfamiliar with mumps.
bookofstevegraham said
March 24, 2017 at 3:33 PM
I’ve been asked to give a brief explanation on my implementation. Here goes. I will be including the complete name of commands and functions, instead of the abbreviations.

Mumps Implementation: (https://en.wikipedia.org/wiki/MUMPS)

MCL> type fword
fword (word,file) ; Find matching words in file # Program name, accepting 2 arguments
;
new char,fword,i,newword,str # Create new versions of variables
if $get(word)=”” write !,”No word was supplied” quit # Quit program if the word was not supplied, or it was = “”
if $get(file)=”” w !,*7,”No file was included” quit # Quit program if the filename was not supplied, or it was = “”
open 1:(file:”R”):0 # Open device 1 in read mode and assign the file to it
if ‘$test write !,*7,”Unable to open “,file quit # When you (try to) open a device with a trailing “:0”, and it is successful, $test is set to 1. If unable it is set to 0.
# If 0, then quit program
write !!,”file: “,file # Write filename with 2 lf/cr (!)
write !,”word: “,word # Write supplied word with lf/cr
use 1 # Begin using the device/file for input
set newword=””,str=”” # Initialize variables
for i=1:1:$length(word) set char=$extract(word,i) set:newword'[char newword=newword_char # Take each unique character in the supplied word and put it in variable newword
# Mumps allows post-conditionals for most commands. set:newword'[char = if newword does not contain char, set …
# [ is the operator for contains and ‘ is the operator for negate/not
for i=1:1 read fword quit:fword=”” do # For each word in file
. if $$matches(newword,fword) set str=str_fword_”,” # If the file word matches the characters in the supplied word, append it to a comma-delimited list of matched words
# $$matches indicates a user-created function named matches. _ is the operator for concatenation.
close 1 # Close device/file and revert to std in/out
write !,”Matching words: “,$select($length(str):$extract(str,1,$length(str)-1),1:”None in “_file) # Write out list of matched words with lf/cr or a message that there were no matches
quit # End program
;
matches(word,fword); See if fword can be built with chars in word # Subroutine for matching supplied word with list word
;
new flg,i # Create new versions of variables
set flg=1 # Initialize flg variable (Assuming the supplied word matches list word)
for i=1:1:$l(fword) do quit:’flg # For each character in list word, do next line, quit loop if flg = 0
. set:word'[$e(fword,i) flg=0 # If supplied word does not contain this character in list word, set flg = 0. This will cause the loop to quit upon returning to it
quit flg # Quit subroutine returning value of flg (1 = it matched, 0 = it did not match)

—

MCL> d ^fword(“praxis”,”/usr/share/dict/words”)

file: /usr/share/dict/words
word: praxis
Matching words: a,air,airs,aria,arias,as,asp,asps,ass,ax,axis,i,iris,is,p,pa,pair,pairs,pap,papa,papas,paps,par,pars,pas,pass,pi,pip,pips,pis,piss,r,rap,raps,rasp,rasps,rip,rips,rs,s,sap,saps,sari,saris,sass,sax,sip,sips,sir,sirs,sis,six,spa,spar,spars,spas,x

—

MCL> d ^fword(“^~”,”/usr/share/dict/words”)

file: /usr/share/dict/words
word: ^~
Matching words: None in /usr/share/dict/words
programmingpraxis said
March 24, 2017 at 3:36 PM
@Bookofstevegraham: Fascinating language. Thank you!
bookofstevegraham said
March 24, 2017 at 3:42 PM
You’re welcome. Thanks for the interest.
Steve Wampler said
March 24, 2017 at 4:24 PM
Looking at the solutions so far it looks like there are several interpretations of the problem: (a) look for words that contain exactly the letters in the set (i.e. if there’s only one ‘s’, then
words can only contain one ‘s’), (b) look for words that only use letters from the set with no repeats, but not necessarily all of them, and (c) words that only letters from the set, but with repeats allowed.

The following Unicon [works in Icon as well] solution assumes (a):
————————————————————-
procedure main(a)
w := a[1] | “gomiz”
every write(matched(w, !&input))
end

procedure matched(w, pword)
every ((*pword = *w), (t := pword), t[bound(upto,!w,t)] := “”)
return (0 = *\t, pword)
end

procedure bound(a[])
return a[1]!a[2:0]
end
————————————————————-
Sample run:
———————————————————-
->anagram west
————————————————————
Steve Wampler said
March 24, 2017 at 4:46 PM
Hmmm, sample run didn’t come through for Unicon solution:
```
## anagram west </usr/share/dict/words
stew
tews
west
wets
##
```
bookofstevegraham said
March 24, 2017 at 5:20 PM
Steve Wampler: Could you comment on how your solution works? I would guess that most are not familiar with the language.
Jay Hammond said
March 24, 2017 at 6:42 PM
an icon program meeting sub-challenge (c). 8 lines of code

procedure main(a)
# given a set of characters in parameter 1 and a wordlist as a file as parameter 2
# respond with the words in the wordlist that meet the criteria
# solution type (c)
#
# what criteria?
#
# The letters in parameter 1 may be used in the target word
# 1 or more times or not at all. in any order.
# for simplicity,
# I have made no special provision for punctuation
# so backslash and quotes, for example may behave unexpectedly.
#
# This is partly also because different shells will have different behaviour.
# backslash escaping may help some situations.
#
###############################
# get the characters
thechars := cset(a[1]) | stop(“No characters found; the word \”\” is made of no characters”)
#get the wordlist
put(a,”C:\\Users\\…\\webster\\WORDS.TXT”)
# local fix puts a dictionary in position3 if user supplied a dictionary, else at position 2
thewords := open(a[2],r) | stop(“No words to test. Is the file valid?”)
#########################
every aword := !thewords do {
charsofaword := cset(aword)
# thediff := charsofaword — thechars
# cset difference, any letters in diff are those in the word, but not in the
# characters supplied by the user as parameter 1.
# if (*() = 0) then write(aword)
if (*(charsofaword — thechars) = 0) then write(aword)
}
end

run using “cat” on a 20C websters list I had.

a
aa
acca
act
acta
at
atta
c
ca
cat
t
ta
taa
tact
tat
tatta
Jay Hammond said
March 24, 2017 at 6:54 PM
Sorry, non-fatal error; “thewords := open(a[2],r)” should read
“thewords := open(a[2],’rt’)”
The variable r is initially null, so the 2nd parameter to “open” defaults to ‘rt’. The compiled program executes the same.

Steve Wampler said

March 24, 2017 at 8:25 PM

I was asked to provide a commented version of my Unicon solution, since both Unicon and Icon might not be familiar to some folks.
A little background – Unicon is a successor to Icon. In both languages expressions are capable of generating sequences of results.
A result sequence might have anywhere from 0 to an infinity of results. Normally expressions only so far as needed to produce a single
result. If no result is produced or if there are no more results that can be produced then expression evaluation is said to fail, otherwise
the evaluation succeeds. If an expression fails, subexpressions that can produce further results are automatically back-tracked into
to produce their next result in an attempt to produce successful evaluation of the entire expression. This process is called goal-directed evaluation.
The every control structure essentially tells a successful evaluation of its expression that it really failed – forcing goal-directed evaluation
to exhaust all possibilities.
Control flow is governed by this success or failure of the evaluation, not by the values computed during evaluation.
(I apologize in advance for the length of this explanation. Unicon and Icon behave differently from most common procedural programming languages
even though they don’t look all that different.)

# Evaluation of a program starts in main.  The parameter a is an array
#   of the arguments.
procedure main(a)
   w := a[1] | "gomiz"  # If there is an argument, assign it to w,
                                #   otherwise assign "gomiz" to w
      # The expression !&input can generate all the lines in standard input
      #   As they are generated they are passed to the procedure matched
      #   If evaluation of matched fails for a line, then backtracking
      #   will try the next line until input is exhausted or matched
      #   succeeds.  When matched succeeds, its value is output.
      #   The every simply forces backtracking to continue after
      #   matched succeeds.  Without the every, evaluation would have
      #   stopped after finding the first word that matched.
   every write(matched(w, !&input)
end

procedure matched(w, pword)
      # Start with a quick check on word lengths, if that fails,
      #   there's no place to backtrack so the evaluation of 
      #   every quickly fails.  If the check succeeds, then assign
      #   a copy of the word taken from the file to t.  Then
      #   for each letter in the set of letters w (!w generates
      #   the individual letters in w, in order), pass that letter,
      #   the value of t, and the function upto to the procedure
      #   bound.  [Read about bound, below.]
      # If bound succeeds, a letter from w has been found in t.
      #   The function upto returns the index of that letter and
      #   The t[bound(upto(!w,t)] := "" expression then has the
      #   effect of removing that letter from t so it won't be
      #   found again.
      # The every keeps backtracking until all of the letters from
      #   !w have been tried.
      # Finally if all of the letters in t have been removed, we
      #   know we have a match.  The return statement's expression
      #   is a short way of saying if 0 = *t then return pword.
   every ((*pword = *w), (t := pword), t[bound(upto,!w,t)] := "")
   return (0 = *\t, pword)
end

# The procedure bound is a way to limit backtracking into the
#   results proceduced by a function while still allowing
#   backtracking into the results produced by the arguments
#   passed to the function.  The a[] in the parameter list
#   simply collects all of the arguments into a the array a
#   (varargs in some other languages).  a[1]!a[2:0] is
#   the dynamic equivalent of [in this case] calling
#   upto(c,t), where c is the letter generated by !w in
#   the call to bound.  A return expression blocks backtracking
#   from returning to that expression.
#
# Bound is used here because upto(c,s) generates all occurences
#   of the character c in s, but we only want to find a single
#   occurence (each letter is to be used exactly once).
#   By using bound the call to upto is effectively blocked
#   from producing more than one result.
procedure bound(a[])
   return a[1]!a[2:0]
end

bookofstevegraham said
March 29, 2017 at 4:58 PM
Implemented in the klong programming language (http://t3x.org/klong/), an array language patterned after the k programming language.

$ kg ./wordsets4.kg mississippi < /usr/share/dict/words
[i imp imps is ism isms m mi miss ms p pi pimp pimps pip pips pis piss s sip sips sis]
bookofstevegraham said
March 29, 2017 at 5:03 PM
Oops, forgot the program. Try again.

$ cat wordsets5.kg
main::{[a fword list word]; word::?.a@0; list::[]; .mi{fword::x; :[~#({word?x}’fword)?[]; list::list,fword; a::””]; .rl()}:~.rl(); .p(list)}

main()

$ kg ./wordsets5.kg mississippi < /usr/share/dict/words
[i imp imps is ism isms m mi miss ms p pi pimp pimps pip pips pis piss s sip sips sis]

$

Programming Praxis

Word Sets

March 17, 2017

20 Responses to “Word Sets”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Word Sets

March 17, 2017

Share this:

Related

20 Responses to “Word Sets”

Leave a comment

Categories

Archives

Archives