Remove Singleton

June 6, 2014

I’ve always found it useful to think about tasks like this in terms of a finite state machine. The machine shown at the right [ sorry, it’s not done yet; I’ll have it later today, I promise ] has three states as it scans the input string: NONE indicates that the machine is not currently in a state of reading the target character, ONE indicates that the machine it currently positioned at an instance of the target character, and MANY indicates that the machine is currently positioned at a consecutive instance of the target character. There are three possibilities at each state: the end of the string has been reached, the next character in the string is the target character, or the next character in the string is not the target character. The state machine is simply encoded in Scheme by a separate function for each state:

(define (remove-singleton c str) (define (none ins outs) (cond ((null? ins) outs) ((char=? (car ins) c) (one (cdr ins) outs)) (else (none (cdr ins) (cons (car ins) outs))))) (define (one ins outs) (cond ((null? ins) outs) ((char=? (car ins) c) (many (cdr ins) (cons c (cons c outs)))) (else (none (cdr ins) (cons (car ins) outs))))) (define (many ins outs) (cond ((null? ins) outs) ((char=? (car ins) c) (many (cdr ins) (cons c outs))) (else (none (cdr ins) (cons (car ins) outs))))) (list->string (reverse (none (string->list str) (list)))))

The code is a straight forward translation of the diagram. Since all of the mutually-recursive calls are in tail position, the function operates iteratively, in constant space. Here is our test suite, which generates output only if it finds an error:

(define (test-remove-singleton) ; no news is good news (assert (remove-singleton #\X "") "") (assert (remove-singleton #\X "X") "") (assert (remove-singleton #\X "XX") "XX") (assert (remove-singleton #\X "XXX") "XXX") (assert (remove-singleton #\X "abcd") "abcd") (assert (remove-singleton #\X "Xabcd") "abcd") (assert (remove-singleton #\X "XXabcd") "XXabcd") (assert (remove-singleton #\X "XXXabcd") "XXXabcd") (assert (remove-singleton #\X "abcdX") "abcd") (assert (remove-singleton #\X "abcdXX") "abcdXX") (assert (remove-singleton #\X "abcdXXX") "abcdXXX") (assert (remove-singleton #\X "abXcd") "abcd") (assert (remove-singleton #\X "abXXcd") "abXXcd") (assert (remove-singleton #\X "abXXXcd") "abXXXcd") (assert (remove-singleton #\X "XabXcdX") "abcd") (assert (remove-singleton #\X "XXabXXcdXX") "XXabXXcdXX") (assert (remove-singleton #\X "XXXabXXXcdXXX") "XXXabXXXcdXXX"))

You can run the program at http://programmingpraxis.codepad.org/ffN5dWrT.

Posted by programmingpraxis

Filed in Exercises

17 Comments »

17 Responses to “Remove Singleton”

josejuan (@__josejuan__) said
June 6, 2014 at 9:27 AM
concat . filter (/=[x]) . group

matthew said

June 6, 2014 at 9:42 AM

#include <stdio.h>
#include <string.h>

void f(char *p, char c)
{
   char *q = p;
 state0:
   if (*p == c) {
      p++;
      goto state1;
   }
   goto state3;
 state1:
   if (*p == c) {
      *q++ = c; *q++ = c;
      p++;
      goto state2;
   }
   goto state3;
 state2:
   if (*p == c) {
      *q++ = *p++;
      goto state2;
   }
 state3:
   if (*p == 0) {
      *q = 0;
      return;
   } else {
      *q++ = *p++;
      goto state0;
   }
}

int main(int argc, char *argv[])
{
   char p[64];
   strcpy(p,argv[1]);
   char c = *argv[2];
   f(p,c);
   printf("%s\n", p);
}

Michael D said
June 6, 2014 at 5:37 PM
In Java:

“XabXXcdX”.replaceAll(“(^|[^X])X([^X]|$)”, “$1$2”)
Lev Lozhkin said
June 6, 2014 at 7:39 PM
Here’s a regex-based one in Python. It uses negative lookahead and lookbehind assertions rather than set complements like the Java one above.
```
import re, sys
re.sub(r"(?<!X)(X)(?!\1)", "", sys.argv[1])
```
If we had a non-trivial singleton pattern to find, it would be better to use back-references in the lookbehind assertion, but Python’s RE module (and many other languages’ as well) does not support this. That would look something like this:
```
re.sub(r"(X)(?<!\1\1)(?!\1)", "", sys.argv[1])
```
The state-machine and set complement approaches should be quicker for simple singleton patterns (like a single known character). For non-trivial singleton patterns, this is probably the simplest general solution.

Mike said

June 6, 2014 at 9:30 PM

Two python solutions. Both break the input string into runs of the same character and then removes/filters runs that match the character to be deleted.

import itertools as it
import re

def func1(s, x):
    return re.sub(x+'+', lambda mo: '' if mo.group() == x else mo.group(),s)

def func2(s, x):
    return ''.join(filter(x.__ne__, (''.join(v) for _,v in it.groupby(s))))

Anna Baas (@venite) said

June 8, 2014 at 7:12 PM

Ruby, iterating over strings like it’s 1991:

inputString = 'XabXXcdX'
outputString = ''
singleton = 'X'
previous = ''

inputString.each_char { |current|     
    if previous == singleton && current == singleton then 
      outputString += previous + current
    elsif previous != singleton then
        outputString += previous unless previous == ''  
    end   
    previous = current    
}                              
puts outputString

matthew said

June 8, 2014 at 7:27 PM

Here’s a nice little Markov program (see http://matthewarcus.wordpress.com/2014/01/08/markov-computation/):

$ cat sing.mkv
0aa => a1a
0a => 0
0b => b0
1a => a1
1b => b0
0 =>.
1 =>.
  => 0
$ ./markov abaabababaa < sing.mkv
abaabababaa
0abaabababaa
0baabababaa
b0aabababaa
ba1abababaa
baa1bababaa
baab0ababaa
baab0babaa
baabb0abaa
baabb0baa
baabbb0aa
baabbba1a
baabbbaa1
baabbbaa

matthew said

June 8, 2014 at 10:59 PM

That Java regex solution doesn’t quite work since the patterns overlap:

$ java Foo aXaaXa
aaaa
$ java Foo aXaXa
aaXa

This seems to work (using the idea of Lev’s solution):

public class Foo {
    public static void main(String[] args) throws Exception {
        String s = args[0];
        System.out.println(s.replaceAll("(?<!X)(X)(?!X)", ""));
    }
}

Removing Repeated Occurrences of a Target Character From a String | Irreal said
June 9, 2014 at 12:56 PM
[…] I was going through my RSS feed the other day, I came across this Programming Praxis problem and thought right away that the solution called for a state machine. It’s one of those […]
Karthikeyan Palaniswamy said
June 9, 2014 at 5:10 PM
Here is my solution in Python. Please review my code and let me know if I can improve my codes.

def removesingleton(strs,chrs):
”’
(str,str)->(str)
Returns str with removed single chr
>>>removesingleton(“Welcomee”,”e”)
Wlcomee
>>>removesingleton(“XabXXaX”,”X”)
abXXa
”’
output, inner_index, outer_index = “”, 0,0
for chars in strs:
if outer_index <= len(strs)-1:
if outer_index == strs.index(chars,outer_index):
if chars == chrs:
inner_index = outer_index
while inner_index+1 < len(strs) and strs[inner_index+1] == chrs:
inner_index += 1
output += strs[outer_index:inner_index+1]
outer_index = inner_index
else:
output += strs[outer_index]
outer_index += 1
return output
Karthikeyan Palaniswamy said
June 9, 2014 at 5:23 PM
def removesingleton(strs,chrs): ''' (str,str)->(str) Returns str with removed single chr >>>removesingleton("Welcomee","e") Wlcomee >>>removesingleton("XabXXaX","X") abXXa ''' output, inner_index, outer_index = "", 0,0 for chars in strs: if outer_index <= len(strs)-1: if outer_index == strs.index(chars,outer_index): if chars == chrs: inner_index = outer_index while inner_index+1 < len(strs) and strs[inner_index+1] == chrs: inner_index += 1 output += strs[outer_index:inner_index+1] outer_index = inner_index else: output += strs[outer_index] outer_index += 1 return output
print removesingleton("XXXabXXXcdXXX","X")=='XXXabXXXcdXXX'

mcmillhj said

June 9, 2014 at 7:57 PM

Simple Perl regex:

#!/usr/bin/perl

use strict;
use warnings; 

use feature qw(say);

sub remove_singleton {
   my ($s, $c) = @_;

   $s =~ s/(?<!$c)($c)(?!$c)//g;
   return $s; 
}

say remove_singleton('XabXXcdX','X');

beoliver said
June 9, 2014 at 8:07 PM
import Data.List (group)

removeSingleton :: Eq a => a -> [a] -> [a]
removeSingleton m = concat . filter (/= [m]) . group
Peter said
June 11, 2014 at 5:05 PM
Here is a possible (partial) picture of the finite state machine referred
in the official solution.

DFA

The signs before the input characters are meant to indicate what
happens to the character in respect to the output.
I hope the image shows up, if it doesn’t, that’s the URL for the time being:

Below is a Haskell rendering of the algorithm:

1 rmSingle c input = 2 let zero [] ys = ys 3 zero (x:xs) ys 4 | x == c = one xs ys 5 | otherwise = zero xs $ x:ys 6 one [] ys = ys 7 one (x:xs) ys 8 | x == c = multiple xs $ x:x:ys 9 | otherwise = zero xs $ x:ys 10 multiple [] ys = ys 11 multiple (x:xs) ys 12 | x == c = multiple xs $ x:ys 13 | otherwise = zero xs $ x:ys 14 in reverse $ zero input []
matthew said
June 12, 2014 at 6:10 AM
Peter: Your DFA description is like a Mealy Machine: http://en.wikipedia.org/wiki/Mealy_machine. A fun programming project would be to write a function that inputs a description of such a machine and outputs an efficient implementation (maybe make a nice Lisp macro).
James Smith said
July 4, 2014 at 11:43 AM
Without using irregular regular expressions…

sub remove_singletons {
#@param $str (string)
#@param $c (string) – singleton to remove
#@return (string) – string with singletons removed..
## Remove singleton instances of given string ($c)
my($str,$c) = @_;
my $r = sprintf ‘(%s+)’, quotemeta $c;
(my $str = shift)=~s{$r}{length $1 > 1 ? $1 : q()}ge;
return $str;
}
[\sourcecode]
FSM Generator | Programming Praxis said
July 11, 2014 at 9:00 AM
[…] few weeks ago we had an exercise that solved the problem of removing singleton occurrences of a given character from a string while […]

Programming Praxis