Two String Exercises
September 2, 2011
These two problems seem to be on every list of programming interview questions:
1) Remove all duplicate characters from a string. Thus, “aaabbb” becomes “ab” and “abcbd” becomes “abcd”.
2) Replace all runs of consecutive spaces with a single space. Thus, “a.b” is unchanged and “a..b” becomes “a.b”, using a dot to make the space visible.
Your task is to write the two requested functions. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
The obvious haskell solutions are
nub
andnubBy (== ' ')
. I don’t know if thats cheating though.Oh sorry, two cups of coffee later I realized, I got the secound one wrong. Here is a working vorsion
noDupSpace = unwords . words
In Clojure.
Hmmm, the formatter messed up my code – here’s the plaintext form.
;; helpers
(defn build-string-step [search-criteria]
(fn [built-so-far next-character]
(if (search-criteria built-so-far next-character)
built-so-far
(conj built-so-far next-character))))
(defn make-string-transformer [f]
(fn [string]
(apply str
(reduce (build-string-step f) [] string))))
;; solve exercise 1
(def rem-dups
(make-string-transformer
(fn [built-so-far next-character]
(some #(= % next-character) built-so-far))))
;; solve exercise 2
(def squash-spaces
(make-string-transformer
(fn [built-so-far next-character]
(and (= next-character \space) (= (last built-so-far) \space)))))
The first excercise can be beautifully implemented string->list and filter (from SRFI-1):
(define (remove-doubles str)
(let* ((seen '())
(seen? (lambda (c)
&nb
sp; (if (memv c seen)
&nb
sp; #f
&nb
sp; (begin
&nb
sp; (set! seen (cons c
seen))
&nb
sp; #t)))))
(list->string (filter seen? (string->list str)))))
Oops, weird formating it should have been:
(define (remove-doubles str)
(let* ((seen '())
(seen? (lambda (c)
(if (memv c seen)
#f
(begin
(set! seen (cons c seen))
#t)))))
(list->string (filter seen? (string->list str)))))
#include
#include
int main()
{
int len,i,j,flag=0,k=0,len1;
char a[10]=”grtasdabd”,b[10],test;
b[0]=”;
printf(“%s\n”,a);
len=strlen(a);
// printf(“%d\n”,len);
for(i=0;i<len;i++)
{
test=a[i];
// printf("%c ",test);
len1=strlen(b);
// printf("%d ",len1);
for(j=0;j<len1;j++)
{
if(test==b[j])
{
flag=1;
break;
}
}
if(flag==0)
{
b[k]=test;
k++;
b[k]='';
}
else
flag=0;
}
printf("%s",b);
return 0;
}
[me@localhost{12:53:12}~]$ echo 'aaabbbbccc' | perl -ple 's/(.)\1+/$1/g;'
abc
[me@localhost{12:53:16}~]$ echo 'a b s sd fe fe fe ef' | perl -ple 's/[\W]+/ /g;'
a b s sd fe fe fe ef
[me@localhost{12:53:18}~]$
Alternatively, for the first one, when duplicate characters are interspersed through the string:
[me@localhost{13:02:35}~]$ echo 'aaabbbbcccaaajjjdfdf fefg ergrdgr rd' | perl -ple 'my %h = map { $_ => 1 } split (); $_ = join "", keys %h'
er adjcbgf
[me@localhost{13:02:47}~]$
the comment above ate the < and > symbols which should go inside the
'split ( ' *HERE* ' )'
I tried to do something slick with Python’s
groupby
for the first one, but it would require sorting the string first (and therefore destroying the order). Similar attempts with set, dictionaries, orCounter
also destroyed the order due to hashing.For the second one i could have used the split but here is my quick implementation:
nice :>
Here’s my clojure solution. More details at my blog.
Something in Common Lisp:
(defun s1 (src)
(concatenate ‘string
(loop
for crt across src
and prev = crt
when (not (equal crt prev))
collect crt)))
(defun s2 (src)
(concatenate ‘string
(loop
for crt across src
and prev = crt
when (not (and (equal crt #\Space)
(equal crt prev)))
collect crt)))
(s1 “aaaaabbbb”)
=> “ab”
(s2 “a bbbbb”)
=> “a bbbbb”
Python 3:
Your rem-dup-char won’t work for Unicode!
Oh, the first one looks better with for/fold:
Oops, mine won’t work for Unicode either! Replace memq with memv…
@Razvan: Instead of creating an intermediate list and then concatenating it into a string, you could use with-output-to-string. I think that would be a lot more efficient, and not more complicated to code.
In Common Lisp, easy solution to the first problem :).
Well that’s cheating I guess, so…
Unfortunately, the previous Perl solutions have problems– the “eliminate duplicate letters” solution (revised) does not preserve the order of the letters, and the “squash spaces” solution would squash all non-alphanumeric characters, not just spaces.
(This is written to be readable, not to be super-optimized. Perl golfers would no doubt be able to shrink this by 50% or more.)
Oops, either I forgot a closing brace on snippet #1, or it got lost somehow. Perl programmers would know that there should be a closing brace to match the subroutine declaration. Sorry.
In ruby …
Racket solutions:
from collections import OrderedDict
def rem_dups(s):
uniqs = OrderedDict()
for ch in s:
uniqs[ch] = True
return ”.join(uniqs.keys())
def rem_conspaces(s):
return ”.join(s.split())
I see there aren’t many haskell solutions posted here.
I’m sure there are better ways to do removeSpaces and I’m thinking using nub might be cheating, but oh well! :)
C++
SOLUTION TO Q1 IN JAVA
import java.util.*;
public class Dup {
public static void main(String args[])
{Scanner in=new Scanner(System.in);
System.out.println(“enter the string”);
String a=in.nextLine();
char orig[]=a.toCharArray();
String ans=””;
for(int i=0;i<a.length();i++)
{
boolean b=true;
for(int j=0;j<i;j++)
{if(orig[j]==a.charAt(i))
{b=false;
break;
}
}
if(b)
ans=ans.concat(a.charAt(i)+"");
}
System.out.println("the answer is "+ans);
}
}
My C solution. Not pretty but it works :-D
[…] Two String Exercises […]
int print_no_duplicates(char* orig_str, int len)
{
int hash_table[127] = {0};
int i;
if ((orig_str == 0)||(len <= 0)) { return -1; }
for (i=0; i<len; i++)
{
if (hash_table[orig_str[i]] == 0) { putchar(orig_str[i]); }
hash_table[orig_str[i]] = 1;
}
}
int _tmain(int argc, _TCHAR* argv[])
{
char* orig_str = "abcbdbba";
printf("orig str -%s- cleaned str -", orig_str);
print_no_duplicates(orig_str, strlen(orig_str));
printf("-\n");
return 0;
}
In C#.
Tried with Python, not elegant though: