First Word

January 25, 2019

We have a simple exercise today, inspired a co-worker. Where I work, we have a reporting tool that permits a “hook” to the underlying SQL in some places. My co-worker asked me how to write an SQL statement that extracts the first word (a maximal sequence of non-spaces) from the beginning of a string (assume there are no leading spaces). For instance, given the string “abcdefg hijklmnop qrs tuv wxyz” the first word is “abcdefg”. Here’s the SQL expression, wrapped in a select statement, with &&STR representing the string:

select substr('&&STR', 1, instr('&&STR', ' ') - 1) from dual

Your task is to write a program to extract the first word from a string. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

6 Responses to “First Word”

  1. matthew said

    Extra points for implementing the Unicode default word breaking algorithm: https://unicode.org/reports/tr29/#Word_Boundaries

  2. V said

    Two solutions in golang.

    
    package main
    
    import (
    	"fmt"
    	"regexp"
    )
    
    func main() {
    	s1 := "abcdefg hijklmnop qrs tuv wxyz"
    	s2 := "~hola   caracola  de piazzolla"
    	fmt.Println(firstWordWithLoop(s1))
    	fmt.Println(firstWordWithRegexp(s1))
    	fmt.Println()
    	fmt.Println(firstWordWithLoop(s2))
    	fmt.Println(firstWordWithRegexp(s2))
    }
    
    func firstWordWithLoop(str string) string {
    	word := ""
    	for _, char := range str {
    		if char == ' ' {
    			break
    		}
    		word += string(char)
    	}
    	return word
    }
    
    func firstWordWithRegexp(str string) string {
    	return regexp.MustCompile(`[^ ]+`).FindString(str)
    }
    
    
  3. Daniel said

    Here’s a solution in C.

    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc, char* argv[]) {
      if (argc != 2) {
        fprintf(stderr, "Usage: %s STR\n", argv[0]);
        return EXIT_FAILURE;
      }
      char* str = argv[1];
      while (1) {
        char c = *(str++);
        if (c == ' ' || c == '\0') break;
        printf("%c", c);
      }
      printf("\n");
      return EXIT_SUCCESS;
    }
    

    Example Usage:

    $ ./a.out "abcdefg hijklmnop qrs tuv wxyz"
    abcdefg
    
  4. Steve said

    AWK version

    $ echo "abcdefg hijklmnop qrs tuv wxyz" | awk ‘{ print $1 }’
    abcdefg

    Klong version

    (-1)_((a?" ")@0)#a::"abcdefg hijklmnop qrs tuv wxyz"
    "abcdef"
    a
    "abcdefg hijklmnop qrs tuv wxyz"
    a?" "
    [7 17 21 25]

    MUMPS version

    YDB>w $p("abcdefg hijklmnop qrs tuv wxyz"," ")
    abcdefg

  5. matthew said

    That Unicode algorithm looks a bit complicated, here’s a simple Unicode-friendly solution using Python str.isspace():

    def firstword(s):
        start = -1
        for i,c in enumerate(s):
            if start < 0:
                if not c.isspace(): start = i;
            elif c.isspace():
                return s[start:i]
        return None if start < 0 else s[start:]
    
    assert(firstword("") is None)
    assert(firstword("  ") is None)
    assert(firstword("foo") == "foo")
    assert(firstword(" foo") == "foo")
    assert(firstword("foo ") == "foo")
    
  6. Python solution:

    firstWord = lambda x: x.lstrip(" ").split(" ")[0]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: