Longest Substring Of Two Unique Characters

June 11, 2013

Today’s exercise comes from our unending list of interview questions:

Find the longest run of consecutive characters in a string that contains only two unique characters; if there is a tie, return the rightmost. For instance, given input string abcabcabcbcbc, the longest run of two characters is the 6-character run of bcbcbc that ends the string.

Your task is to write the requested program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

13 Responses to “Longest Substring Of Two Unique Characters”

  1. […] today’s Programming Praxis exercise, our goal is to find the longest substring consisting of only two […]

  2. My Haskell solution (see http://bonsaicode.wordpress.com/2013/06/11/programming-praxis-longest-substring-of-two-unique-characters/ for a version with comments):

    import Data.List
    
    lstuc :: Eq a => [a] -> [a]
    lstuc xs = foldr (\x a -> if length x > length a then x else a) []
               [concat $ a:b:takeWhile (flip elem [head a, head b] . head) cs
               | (a:b:cs) <- tails $ group xs]
    
  3. brooknovak said

    Here is a O(N) solution written in C#:

    public static string GetLongestSubset(string str) {
    			if (str == null)
    				throw new ArgumentNullException ("str");
    
    			if (str.Length == 0)
    				return String.Empty;
    		
    			int s1 = 0, s2 = 0;
    
    			// Get next unique char
    			for(int i = 1; i < str.Length; i++) {
    				if (str [i] != str[s1]) {
    					s2 = i;
    					break;
    				}
    			}
    
    			if (s2 == 0)
    				return str; // all chars are the same
    
    			int sequenceStart = 0;
    			int maxStart = 0, maxEnd = 0;
    			int maxDiff = 0;
    
    			// Scan for longest sequences
    			for (int i = s2 + 1; i < str.Length; i++) {
    				// Can this new char belong to the current substring we are building up?
    				if (str [i] != str [s1] && str [i] != str [s2]) {
    					int diff = i - sequenceStart;
    					if (diff >= maxDiff) {
    						maxDiff = diff;
    						maxStart = sequenceStart;
    						maxEnd = i - 1;
    					}
    					if (s1 > s2) {
    						sequenceStart = s1;
    						s2 = i;
    					} else {
    						sequenceStart = s2;
    						s1 = i;
    					}
    				} else if (str [i] != str [i - 1]) {
    					// Track start of each char sequence containing the same char
    					if (str [i] == str [s1]) {
    						s1 = i;
    					} else {
    						s2 = i;
    					}
    				}
    			}
    
    			int lastDiff = str.Length - sequenceStart;
    			if (lastDiff >= maxDiff) {
    				maxDiff = lastDiff;
    				maxStart = sequenceStart;
    				maxEnd = str.Length - 1;
    			}
    
    			return str.Substring (maxStart, (maxEnd - maxStart) + 1);
    		}
    
  4. Mike said

    Just because I like regular expressions:

    
    import re
    
    def lsotuc(s):
       pat = re.compile(r'(.)\1*(.)(\1|\2)*', re.DOTALL)
       return max((pat.search(s,i).group() for i in range(len(s)-1)), key=len)
    
    
  5. shtipki said

    A Perl try (again with regex):

    sub max2substr {
        my @chars = split "", (my $input = shift);
        reduce { length $b >= length $a ? $b : $a } grep { defined } map { $input =~ /((?:$_){2,})/ } map { join "", @chars[$_..$_+1] } (0..$#chars-1);
    }
    
  6. […] Question is from here. […]

  7. javabloggi said

    My java solution here.

  8. coldgrnd said

    efficient O(n) solution in haskell:

    {-# LANGUAGE BangPatterns #-}
    import qualified Data.ByteString as B

    loop !_ _ _ _ _ (kmax,imax) !s | B.null s = (kmax,imax)
    loop i n k a b (kmax,imax) s
    | B.head s == a = cont (n+1) (k+1) a b
    | B.head s == b = cont 1 (k+1) b a
    | otherwise = cont 1 (n+1) (B.head s) a
    where
    cont !n’ !k’ !a’ b’ =
    if k’ >= kmax
    then loop (i+1) n’ k’ a’ b’ (k’,i) (B.tail s)
    else loop (i+1) n’ k’ a’ b’ (kmax,imax) (B.tail s)

    findLongest :: B.ByteString -> B.ByteString
    findLongest b =
    case loop 0 0 0 0 0 (0,0) b of
    (k,i) -> B.take k . B.drop (i-k+1) $ b

  9. wlingke said

    In Javascript:

    function substring(str) {
    var myString = str;
    var count = 0;
    var returnStr = “”;
    for (var i = 0; i = count) {
    count = tempCount;
    returnStr = tempReturnStr.join(“”);
    }
    }
    }
    return returnStr;
    }

    alert(substring(“abcabcabcbcbc”));

  10. wlingke said

    Sorry… post fail. Clearly time to go to bed.

    http://jsfiddle.net/z92aY/

  11. Mike said

    Here’s an O(n) solution:

    import itertools as it
    import re
    
    from collections import deque
    
    def sotuc(s):
    	"""yields subsequences that consist of two unique characters"""
    
    	grouper = (mo.group() for mo in re.finditer(r'((.)\2*)', s, re.S))
    	groups = list(it.islice(grouper,2))
    	tuc = deque((g[0] for g in groups), maxlen=2)
    	for group in grouper:
    		if group[0] != tuc[0]:
    			yield ''.join(groups)
    			del groups[:-1]
    		tuc.append(group[0])
    		groups.append(group)
    	yield ''.join(groups)
    
    def lsotuc(s):
    	return max(sotuc(s), key=len)
    
    
  12. George Leake said

    A long-winded solution in Python:

    import itertools

    def sub448(s):
    ln = len(s)
    if ln % 2 == 1:
    s += ‘x’

    st = list(set(list(s)))
    p = list(itertools.combinations(st, 2))
    coms = []
    for b in p:
        coms.append(''.join(sorted(b)))
    
    
    ch =[]
    for a in coms:
        ch.append([s.count(a),a])
    pass
    
    k = sorted(ch)
    k = k[::-1]
    tgt = k[0][1]
    
    tg = ''
    while len(tg) < len(s):
       tg += tgt
    step = len(tgt)
    for j in range(ln,0,-step):
        if tg[:j] in s:
           break
    pass
    return j
    

Leave a comment