## Longest Substring Of Two Unique Characters

### June 11, 2013

I wrote today’s exercise as an excuse to illustrate the sets of the previous exercise. We use two loops to form every possible substring, and for each substring enter all the characters into a set, then count the characters:

```(define (two-char str)   (let ((longest "") (len 0) (n (+ 1 (string-length str))))     (do ((i 0 (+ i 1))) ((= i n) longest)       (do ((j (+ i 1) (+ j 1))) ((= j n))         (when (and (<= (count (substring str i j)) 2)                    (<= len (- j i -1)))           (set! longest (substring str i j))           (set! len (- j i -1)))))))```

That uses an auxiliary function to count the number of unique characters in a string; each character is inserted into a set, and the size of the set is returned at the end of the string:

```(define (count str)   (let loop ((cs (string->list str)) (s (make-set 29)))     (if (null? cs) (s 'size)       (loop (cdr cs) (s 'adjoin (car cs))))))```

Here are some examples:

```> (two-char "abcabcbcbc") "bcbcbc" > (two-char "abababcabc") "ababab" > (two-char "abcacacabc") "cacaca"```

That algorithm is O(n2); there are better algorithms, but I’ll leave those to you in the comments. You can run the program at http://programmingpraxis.codepad.org/ybpvp6gv.

Pages: 1 2

### 13 Responses to “Longest Substring Of Two Unique Characters”

1. […] today’s Programming Praxis exercise, our goal is to find the longest substring consisting of only two […]

2. My Haskell solution (see http://bonsaicode.wordpress.com/2013/06/11/programming-praxis-longest-substring-of-two-unique-characters/ for a version with comments):

```import Data.List

lstuc :: Eq a => [a] -> [a]
lstuc xs = foldr (\x a -> if length x > length a then x else a) []
[concat \$ a:b:takeWhile (flip elem [head a, head b] . head) cs
| (a:b:cs) <- tails \$ group xs]
```
3. brooknovak said

Here is a O(N) solution written in C#:

```public static string GetLongestSubset(string str) {
if (str == null)
throw new ArgumentNullException ("str");

if (str.Length == 0)
return String.Empty;

int s1 = 0, s2 = 0;

// Get next unique char
for(int i = 1; i < str.Length; i++) {
if (str [i] != str[s1]) {
s2 = i;
break;
}
}

if (s2 == 0)
return str; // all chars are the same

int sequenceStart = 0;
int maxStart = 0, maxEnd = 0;
int maxDiff = 0;

// Scan for longest sequences
for (int i = s2 + 1; i < str.Length; i++) {
// Can this new char belong to the current substring we are building up?
if (str [i] != str [s1] && str [i] != str [s2]) {
int diff = i - sequenceStart;
if (diff >= maxDiff) {
maxDiff = diff;
maxStart = sequenceStart;
maxEnd = i - 1;
}
if (s1 > s2) {
sequenceStart = s1;
s2 = i;
} else {
sequenceStart = s2;
s1 = i;
}
} else if (str [i] != str [i - 1]) {
// Track start of each char sequence containing the same char
if (str [i] == str [s1]) {
s1 = i;
} else {
s2 = i;
}
}
}

int lastDiff = str.Length - sequenceStart;
if (lastDiff >= maxDiff) {
maxDiff = lastDiff;
maxStart = sequenceStart;
maxEnd = str.Length - 1;
}

return str.Substring (maxStart, (maxEnd - maxStart) + 1);
}
```
4. Mike said

Just because I like regular expressions:

```
import re

def lsotuc(s):
pat = re.compile(r'(.)\1*(.)(\1|\2)*', re.DOTALL)
return max((pat.search(s,i).group() for i in range(len(s)-1)), key=len)

```
5. shtipki said

A Perl try (again with regex):

```sub max2substr {
my @chars = split "", (my \$input = shift);
reduce { length \$b >= length \$a ? \$b : \$a } grep { defined } map { \$input =~ /((?:\$_){2,})/ } map { join "", @chars[\$_..\$_+1] } (0..\$#chars-1);
}
```
6. […] Question is from here. […]

7. […] Question is from here. […]

8. javabloggi said

My java solution here.

9. coldgrnd said

efficient O(n) solution in haskell:

{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B

loop !_ _ _ _ _ (kmax,imax) !s | B.null s = (kmax,imax)
loop i n k a b (kmax,imax) s
| B.head s == a = cont (n+1) (k+1) a b
| B.head s == b = cont 1 (k+1) b a
| otherwise = cont 1 (n+1) (B.head s) a
where
cont !n’ !k’ !a’ b’ =
if k’ >= kmax
then loop (i+1) n’ k’ a’ b’ (k’,i) (B.tail s)
else loop (i+1) n’ k’ a’ b’ (kmax,imax) (B.tail s)

findLongest :: B.ByteString -> B.ByteString
findLongest b =
case loop 0 0 0 0 0 (0,0) b of
(k,i) -> B.take k . B.drop (i-k+1) \$ b

10. wlingke said

In Javascript:

function substring(str) {
var myString = str;
var count = 0;
var returnStr = “”;
for (var i = 0; i = count) {
count = tempCount;
returnStr = tempReturnStr.join(“”);
}
}
}
return returnStr;
}

11. wlingke said

Sorry… post fail. Clearly time to go to bed.

http://jsfiddle.net/z92aY/

12. Mike said

Here’s an O(n) solution:

```import itertools as it
import re

from collections import deque

def sotuc(s):
"""yields subsequences that consist of two unique characters"""

grouper = (mo.group() for mo in re.finditer(r'((.)\2*)', s, re.S))
groups = list(it.islice(grouper,2))
tuc = deque((g for g in groups), maxlen=2)
for group in grouper:
if group != tuc:
yield ''.join(groups)
del groups[:-1]
tuc.append(group)
groups.append(group)
yield ''.join(groups)

def lsotuc(s):
return max(sotuc(s), key=len)

```
13. George Leake said

A long-winded solution in Python:

import itertools

def sub448(s):
ln = len(s)
if ln % 2 == 1:
s += ‘x’

``````st = list(set(list(s)))
p = list(itertools.combinations(st, 2))
coms = []
for b in p:
coms.append(''.join(sorted(b)))

ch =[]
for a in coms:
ch.append([s.count(a),a])
pass

k = sorted(ch)
k = k[::-1]
tgt = k

tg = ''
while len(tg) < len(s):
tg += tgt
step = len(tgt)
for j in range(ln,0,-step):
if tg[:j] in s:
break
pass
return j
``````