First Unrepeated Character In A String

April 30, 2013

There are several ways to do this, all of them a little bit messy because there are two things to keep track of, the character count and the position in the string. We will use a two-stage algorithm: first pass through the string keeping track of character counts and positions, then find the minimum position of those characters with a count of one:

(define (f str)
  (let ((len (string-length str)) (x (make-vector 256 -1)))
    (do ((i 0 (+ i 1))) ((= i len))
      (let ((c (char->integer (string-ref str i))))
        (cond ((= (vector-ref x c) -1) (vector-set! x c i))
              ((< -1 (vector-ref x c)) (vector-set! x c -2)))))
    (let loop ((c 0) (i (+ len 1)))
      (cond ((= c 256)
              (if (< i len)
                  (values i (string-ref str i))
                  (error 'f "no unrepeated character")))
            ((and (< -1 (vector-ref x c)) (< (vector-ref x c) i))
              (loop (+ c 1) (vector-ref x c)))
            (else (loop (+ c 1) i))))))

The key to the algorithm is the vector x, which is indexed by character ordinals from 0 to 255. Each element of x is initialized to -1, indicating it has not yet been seen. The first time the c‘th character is seen, x[c] is set to its index i in the string. If the c‘th character appears a second time, x[c] is set to -2.

Thus, each character in the string is examined, in order, in the do loop, and x[c] is set appropriately. Then, in the named-let loop, each x[c] is examined, and if it is zero or greater (indicating that the c‘th character appeared once in the input), then its index is compared to the current minimum. Here are some examples:

> (f "aaab")
3
#\b
> (f "aaabbbcddd")
6
#\c
> (f "aaaebbbcddd")
3
#\e
> (f "aabbcc")
error: no unrepeated character

You can run the program at http://programmingpraxis.codepad.org/yNXZMJsj.

About these ads

Pages: 1 2

28 Responses to “First Unrepeated Character In A String”

  1. #|
        Given a string, find the first character that appears only once
        in the string. For instance, given the string “aabbcddd”, the
        first character that appears only once is the “c” found at the
        4th character in the string, counting from 0. Be sure that your
        program properly handles the case that all characters appear more
        than once.
    |#

    (defun first-singleton (sequence &key (test (function eql)))
      (let ((histogram (make-hash-table :test test :size (length sequence))))
        (map nil (let ((index -1))
                   (lambda (element)
                     (let ((entry (gethash element histogram)))
                       (if entry
                           (incf (cdr entry))
                           (setf (gethash element histogram) (cons (incf index) 1))))))
             sequence)
        (let ((min-element nil)
              (min-index nil))
         (maphash (lambda (element entry)
                    (destructuring-bind (first-index . occurences) entry
                      (when (and (= 1 occurences)
                                 (or (null min-index)
                                     (< first-index min-index)))
                        (setf min-index first-index
                              min-element element))))
                  histogram)
         (values min-element min-index))))

    ;; (first-singleton “aabbcddd”)
    ;; #\c
    ;; 2
    ;;
    ;; (first-singleton “aabbcccddd”)
    ;; NIL
    ;; NIL
    ;;
    ;; (first-singleton ‘(t nil t t))
    ;; NIL
    ;; 1
    ;;
    ;; (first-singleton ‘(t nil nil t t))
    ;; NIL
    ;; NIL

  2. The solution you’re proposing doesn’t work: (f “∀x∈N, x+1≥x ∧ x+1∈N”) may return 2, or may signal an error!

  3. […] today’s Programming Praxis exercise, our goal is to find the find the first unrepeated character in a […]

  4. My Haskell solution (see http://bonsaicode.wordpress.com/2013/04/30/programming-praxis-first-unrepeated-character-in-a-string/ for a version with comments):

    import Data.Maybe
    import Data.List
    
    unrepeated :: String -> Maybe (Integer, Char)
    unrepeated = listToMaybe . snd . foldr f ([], []) . zip [0..] where
        f (i,c) (ds,us) = if elem c ds then (ds, us) else
            maybe (ds, (i,c):us) (\(fi,fc) -> (fc:ds, delete (fi,fc) us)) $
            find ((== c) . snd) us
    
  5. Jan Van lent said

    Python solution. The complexity is not optimal, but it is short and easy to understand.

    def first_unrepeated(s):
        for i, c in enumerate(s):
            if s.count(c) == 1:
                return i
    
    if __name__ ==  "__main__":
        examples = [ "aaab", "aaabbbcddd", "aaaebbbcddd", "aabbcc"]
        examples += [ "aabbcddd",  "aabbcccddd" ]
        examples += [ [True, False, True, True], [True, False, False, True, True] ]
        for s in  examples:
            print first_unrepeated(s)
    
  6. Maurits said

    Using the 8-bit value of the character as an index into a lookup table breaks when you want to deal with Unicode text.

    An O(n^2) approach which requires only O(1) additional memory would be to do something like this:

    for (i = 0 to len – 2) {
    duplicate = false;
    for (j = i + 1 to len – 1) {
    if (s[i] == s[j]) { duplicate = true; break; }
    }

    if (!duplicate) { return i; }
    }

    return -1;

    An O(n log n) approach which requires O(n) additional memory would be to:

    1. build an array of { character, index } elements
    2. sort it by .character
    3. walk the sorted array looking at all unique characters
    4. return the lowest-valued .index

  7. Mike said

    Python version using Counter and OrderedDict from the standard library to build an OrderedCounter.

    from collections import Counter, OrderedDict
    
    class OrderedCounter(Counter, OrderedDict):
    	pass
    
    def first_singleton(seq):
    	oc = OrderedCounter(seq)
    	fs = min(oc, key=oc.get)
    	return seq.index(fs) if oc[fs] == 1 else None
    
  8. Shail said

    #include
    #include
    #define NO_OF_CHARS 256
    /* Returns an array of size 256 containg count
    of characters in the passed char array */
    int *getCharCountArray(char *str)
    {
    int *count = (int *)calloc(sizeof(int), NO_OF_CHARS);
    int i;
    for (i = 0; *(str+i); i++)
    count[*(str+i)]++;
    return count;
    }

    /* The function returns index of first non-repeating
    character in a string. If all characters are repeating
    then reurns -1 */
    int firstNonRepeating(char *str)
    {
    int *count = getCharCountArray(str);
    int index = -1, i;

    for (i = 0; *(str+i); i++)
    {
    if(count[*(str+i)] == 1)
    {
    index = i;
    break;
    }
    }
    return index;
    }

    /* Driver program to test above function */
    int main()
    {
    char str[] = “geeksforgeeks”;
    int index = firstNonRepeating(str);
    if(index == -1)
    printf(“Either all characters are repeating or string is empty”);
    else
    printf(“First non-repeating character is %c”, str[index]);
    getchar();
    return 0;
    }

  9. margaruga said

    def unrepeatedcharacter(string):
    for i in range(0, len(string)):
    if string[i] not in string[:i] + string[i + 1:]:
    return string[i]
    return None

  10. margaruga said
    def unrepeatedcharacter(string):
        for i in range(0, len(string)):
            if string[i] not in string[:i] + string[i + 1:]:
                return string[i]
        return None
    
  11. Josef Svenningsson said

    There’s a slight variation on the algorithm you gave which I think is slightly simpler and could possibly be faster, at least if the first unrepeated element occurs relatively early in the string.

    Start by creating an array that just keeps a count of how many times each character occurs. Then let the second loop go through the string again and look up the occurs count for each character. The first one which occurs once is the answer. In order to return the index, the loop just have to keep track of the current index as it traverses the string.

    Haskell code:

    import Data.Array

    unrepeat :: String -> Maybe (Int,Char)
    unrepeat str = loop 0 str $ accumArray (+) 0 (‘a’,’z’) $ zip str (repeat 1)
    where loop i [] arr = Nothing
    loop i (a:as) arr
    | arr!a == 1 = Just (i,a)
    | otherwise = loop (i+1) as arr

  12. Josh said

    Not the very prettiest solution, but I’d say near optimal in time.

    def first_unrepeated(string):
        good, bad = {}, []
        for i, char in enumerate(string):
            if char in bad:
                continue
            if char in good:
                del good[s]
                bad.append(s)
                continue
            good[s] = i
        return good and min(good.items(), key=lambda x: x[1]) or None
    
  13. Josh said

    Obviously it should be:

    def first_unrepeated(string):
        good, bad = {}, []
        for i, char in enumerate(string):
            if char in bad:
                continue
            if char in good:
                del good[char]
                bad.append(char)
                continue
            good[char] = i
        return good and min(good.items(), key=lambda x: x[1]) or None
    
  14. Shail said

    well, thanks for your suggestion..
    i am just a beginner to C so acc to me my logic is good
    still will study yours script
    :)

  15. Alan S. said

    Most high-level languages have string searching and splitting operations, but using these kind of defeats the purpose of the exercise.

    The essential method, find_first_singleton is pasted below. The complete program, with tests, is available at codepad.org

    # String.each_char_with_index {|c| code block with #c }
    
    class String
    
      def each_char_with_index
        x = 0
        while x < self.length do
          char = self[x,1]
          yield [char, x]
          x += 1
        end
      end
    
      # String.find_first_singleton(char) => [char, index] or nil
    
      def find_first_singleton
        letters = {}                  # hash of letters & occurences
    
        self.each_char_with_index do |letter, x|
          letters[letter] ||= 0
          letters[letter] += 1
        end
    
        # all letters counted; find the first with count == 1
        self.each_char_with_index do |letter, x|
          return [letter, x] if letters[letter] == 1 
        end
        nil
      end
    end # String enhancement
    

    Here is the invocation of the method find_first_singleton:

      str = ARGV.shift        # get the argument
      puts "Input: #{str}"
      puts str.find_first_singleton.pretty
    
  16. aks said

    Another solution, using the very high-level language, J (see J Software).

    First, let’s see the solution, then show how it works.

    ffs =: monad define
    n =. 1 i.~ +/e. y
    if. (n>:#y) do. (0$0) else. ((n{y);n) end.
    )

    Examples of usage:

    ffs ‘foo’
    +-+-+
    |f|0|
    +-+-+
    ffs ‘oof’
    +-+-+
    |f|2|
    +-+-+
    ffs ‘foobar’
    +-+-+
    |f|0|
    +-+-+
    ffs ‘oofbar’
    +-+-+
    |f|2|
    +-+-+
    ffs ‘oofbarf’
    +-+-+
    |b|3|
    +-+-+
    ffs ‘boofbarf’
    +-+-+
    |a|5|
    +-+-+
    ffs ‘aabbcc’

    Notice that the last invocation produced no value.

    Okay. How did that work?

    The expression “e. y” means “member in” the monadic function argument “y”. In the examples above, the arguments given to the function are represented as “y” within the function definition.

    So, the expression below produces a boolean array indicating where each letter of the argument occurs

    e. ‘foobar’
    1 0 0 0 0 0
    0 1 1 0 0 0
    0 1 1 0 0 0
    0 0 0 1 0 0
    0 0 0 0 1 0
    0 0 0 0 0 1

    Then, “+/” means “reduce by addition”, yielding a frequency count of each letter in the argument:

    +/e. ‘foobar’
    1 2 2 1 1 1
    [sourcecode]

    Now, finding the first occurrence of a letter that occurs only once within the string is done by finding the first index of 1. This is done by the "index" operator "i.", the left argument of which is the data to be indexed, and the right argument is the value to index. However, since the value is already on the right side, we need to also use "reflex" (~) to switch the arguments:

    [sourcecode lang="j"]
    1 i.~ +/e. ‘foobar’
    0

    This means that the letter at index 0 (origin zero) is the first letter that has no other occurrences.

    Now, we need to use that index, and produce both the letter and the index as a result.

    Before we do that, however, we also need to consider the case where the index is out of range (where there are no singleton letters). This is done with a brute force “if-then-else” control structure. Many advanced J programmers try to avoid using these explicit control structures, but that’s not a goal for this exercise.

    The sentence below is the final result of the “ffs” function, producing the “boxed” pair: the letter and its index, or, an empty value (where there is no singleton letter in the string).

    if. (n>:#y) do. (0$0) else. ((n{y);n) end.

  17. anwesh said

    print “enter a string”
    string=raw_input(“str:”)
    b=list(string)
    print b
    for i in b:
    a=b.count(i)
    if a==1:
    print i
    break

  18. javabloggi said

    My java solution here.

  19. George said

    My solution here: https://github.com/gepoch/praxis/blob/master/2013/april/first_unrepeated.py

    It turned out to be very similar to Josh’s. but sets make it speedier.

  20. My none-to-ambitious Ruby solution:

    def first_unique(str)
      first_unique = str.each_char.select{|c| str.index(c) == str.rindex(c)}[0]
      return if first_unique.nil?
      [first_unique, str.index(first_unique)]
    end
    
  21. Alcriss said

    int _holder = 0;

    Console.Write(“\n Enter a string: “);
    string _input = Console.ReadLine().Replace(” “, “”);
    Console.Write(“\n”);

    for (int i = 0; i ” + i + ” index. \n”);
    }
    }

    Console.ReadLine();

    }
    //METHODS

    public static int RunCheck(char entry, string input)
    {
    int _count = 0;
    for (int i = 0; i < input.Length; i++)
    {
    if (entry == input[i])
    {
    _count = _count + 1;
    }
    }
    return _count;
    }

  22. jaysonsoi said

    c#:
    class Program
    {
    static void Main(string[] args)
    {
    Console.WriteLine(“Input string: “);
    string _xString = Console.ReadLine();
    for (int i = 0; i < _xString.Length; i++)
    {
    int _count = 0;

    for (int x = 0; x < _xString.Length; x++)
    {
    if (_xString[i] == _xString[x])
    {
    _count++;
    continue;
    }
    }
    if (_count == 1)
    {
    Console.WriteLine("Character with 1 count: " + _xString[i]);
    }
    }
    Console.ReadLine();
    }
    }
    }

  23. hakim said

    here my solution with groovy :

    def word=’aaggbbccffrrrrttttuuiiooppzzq’

    def chars=[]

    word.each{c->
    if (chars.contains(c)) chars.remove(c);
    else chars << c;
    }

    if (chars.size())
    println chars[0] + " position : " + word.indexOf(chars[0])
    else println "no result found"

    hope will help

  24. Abrar A. Shareef said

    My Java Solution:

    public static void firstUnrepeatedChar(String str) {
    for (int i = 1; i < str.length(); i++) {
    if (str.charAt(i) != str.charAt(str.length() – 1)) {
    if (str.charAt(i) != str.charAt(i – 1)
    && (str.charAt(i) != str.charAt(i + 1))) {
    System.out.println(str.charAt(i) + " " + i);

    }
    } else if (str.charAt(i) == str.charAt(str.length() – 1)
    && str.charAt(i) != str.charAt(i – 1)) {

    System.out.println(str.charAt(i) + " " + i);

    }
    else{
    System.out.println("none unrepeated");

    }

    }

    }

  25. Preethi said

    My C# solution:

    class FindFirstUnRepeatedCharInGivenString
    {
    public string MyString { get; set; }
    public FindFirstUnRepeatedCharInGivenString(string mystring)
    {
    MyString = mystring;

    }
    public char Find()
    {
    var IsRepeatingCharacter = true;
    SortedSet set = new SortedSet();
    for (int i = 0; i < MyString.Length; i++)
    {
    IsRepeatingCharacter = set.Add(MyString[i]);
    if (!IsRepeatingCharacter)
    {
    return MyString[i];
    }
    }
    return String.Empty.ToCharArray().FirstOrDefault();

    }
    }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 642 other followers

%d bloggers like this: