Identifying Anagrams
April 28, 2015
Two words are anagrams if they consist of the same letters, with the same number of occurrences, in a different order. For instance, DEPOSIT and DOPIEST are anagrams (aren’t you glad you know that), and OPTS, POTS, TOPS and STOP form an anagram class.
Your task is to write a program that takes two strings as input and determines whether or not they are anagrams; you may assume that the strings consist of only the letters A through Z in upper case. You must provide at least two different algorithms that work in fundamentally different ways. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Haven’t done the prime number version but the first two were the solutions I went for – both in perl tho’
In Python.
Same solutions as Paul in Python. Added additional method using mapping onto primes and taking the product.
First version creates a bag out of each string and compares the bags for equality. The second solution removes all characters in the first string from the second string and checks that it results in the empty string, and vice versa.
Just one solution for the moment: a variant on the sorting method using two heaps. We save on work in the case that the two strings aren’t anagrams:
Here’s another one: generate all possible anagrams of a, and see if any are equal to b:
Not the most efficient way of solving the problem, but has a pleasant simplicity about it (and isanag only destructively modifies one of its arguments now).
Last one, pack the counts into a single 64 bit number. There are only 2 bits per character so eg. “AAAAC” is considered an anagram of “BBBBB”, but we always correctly detect a true anagram:
Your second solution doesn’t work too well on a Unicode system: you’d need an array of size 1,114,112. However, a hash table or similar map from characters to integers is the same in spirit.
#include <map>
#include <string>
#include <algorithm>
#include <iostream>
void normalize(std::string& s)
{
s.erase(s.begin(), std::find_if_not(s.begin(), s.end(), ::isspace));
s.erase(std::find_if_not(s.rbegin(), s.rend(), ::isspace).base(), s.end());
std::transform(s.begin(), s.end(), s.begin(), ::toupper);
}
bool common(std::string& a, std::string& b)
{
normalize(a);
normalize(b);
if (a.size() != b.size()) return false;
if (a == b) return false;
return true;
}
std::map<char, int> analyze(std::string word)
{
std::map<char, int> data;
for (auto c: word) {
auto iter = data.find(c);
if (data.end() == iter) data[c] = 1;
else ++(iter->second);
}
return data;
}
bool are_anagrams1(std::string a, std::string b)
{
if (!common(a, b)) return false;
return analyze(a) == analyze(b);
}
bool are_anagrams2(std::string a, std::string b)
{
if (!common(a, b)) return false;
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
return a == b;
}
void test(const std::string& a, const std::string& b, std::ostream& out)
{
out << a << " and " << b << ": " <<
are_anagrams1(a, b) << ", " <<
are_anagrams2(a, b) << ‘\n’;
}
int main(int argc, char** argv)
{
std::cout.setf(std::ios_base::boolalpha);
test("deposit ", " dopiest", std::cout);
test("STOP", "pots", std::cout);
test("rite", "write", std::cout);
test("right", "write", std::cout);
test("same", "same", std::cout);
}
Sorry for the bad formatting in my previous post. Is there a way to delete it?
My discussion and solution in Java here http://www.capacode.com/?p=7