Validating Telephone Numbers

December 13, 2011

The hard part of this exercise is deciding exactly what rules determine a valid telephone number; fortunately, that has already been done for us. Since the optional part of the telephone number is at the front of the number, we’ll work back-to-front instead of front-to-back:

(define (valid-phone str)
  (define (strip-white cs)
    (drop-while char-whitespace? cs))
  (define (strip-white-dot-dash cs)
    (let ((cs (strip-white cs)))
      (if (not (member (car cs) (list #\. #\-))) cs
        (strip-white (cdr cs)))))
  (define (get-number cs)
    (let loop ((i 4) (cs (strip-white cs)) (ds (list)))
      (cond ((zero? i) (values ds cs))
            ((null? cs) (values #f cs))
            ((not (char-numeric? (car cs))) (values #f cs))
            (else (loop (- i 1) (cdr cs) (cons (car cs) ds))))))
  (define (get-exchange cs)
    (let loop ((i 3) (cs (strip-white-dot-dash cs)) (ds (list)))
      (cond ((zero? i) (values ds cs))
            ((null? cs) (values #f cs))
            ((not (char-numeric? (car cs))) (values #f cs))
            (else (loop (- i 1) (cdr cs) (cons (car cs) ds))))))
  (define (get-area cs)
    (let ((cs (strip-white cs)))
      (cond ((null? cs) (values (list) (list)))
            ((char=? (car cs) #\-) (get-area (cdr cs)))
            ((char=? (car cs) #\.) (get-area (cdr cs)))
            ((char=? (car cs) #\))
              (call-with-values
                (lambda () (get-area (cdr cs)))
                (lambda (area cs)
                  (if (not area) (values #f cs)
                    (if (not (char=? (car cs) #\())
                        (values #f cs)
                        (values area (cdr cs)))))))
            ((char-numeric? (car cs))
              (let loop ((i 3) (cs cs) (ds (list)))
                (cond ((zero? i) (values ds cs))
                      ((null? cs) (values #f cs))
                      ((not (char-numeric? (car cs))) (values #f cs))
                      (else (loop (- i 1) (cdr cs) (cons (car cs) ds))))))
            (else (values #f cs)))))
  (call-with-values
    (lambda () (get-number (reverse (string->list str))))
    (lambda (number cs)
      (if (not number) #f
        (call-with-values
          (lambda () (get-exchange cs))
          (lambda (exchange cs)
            (if (not exchange) #f
              (call-with-values
                (lambda () (get-area cs))
                (lambda (area cs)
                  (if (and area (null? (strip-white cs)))
                      (list->string (append area exchange number))
                      #f))))))))))

Here cs is the remaining unparsed characters of the input string and area, exchange, and number are either lists of output digits or #f indicating an error. Note that area is only #f if there is some error; the null list indicates that no area code was given.

This function is appallingly tedious, but most parsers are. The only defense against getting some of the details wrong is testing. Here’s our test function, which expands on the lists of valid and invalid telephone numbers given in the exercise. Note that the simple function

(define (valid-phone str)
  (= (length (filter char-numeric? (string->list str))) 10))

is not suitable for validation, since it doesn’t validate the structural elements of the phone number, though it is suitable for confirming that a valid phone number is correctly accumulated:

(define (test-phone)
  (let ((valid (list "1234567890" "123-456-7890" "123.456.7890"
          "(123)456-7890" "(123) 456-7890" "456-7890"))
        (invalid (list "12-345-6789" "123-45-6789" "123-456-789"
          "123-45-67890" "123:4567890" "123/456-7890" "123456"
          "12345678" "123456789" "12345678901" "123-456-7890 x123")))
    (for-each
      (lambda (str)
        (assert (string->list (valid-phone str))
                (filter char-numeric? (string->list str))))
      valid)
    (for-each
      (lambda (str) (assert (valid-phone str) #f))
      invalid)))

I will never admit how many errors this test suite found in the original version of my valid-phone function.

We used drop-while from the Standard Prelude, plus filter and assert for testing. You can run the program at http://programmingpraxis.codepad.org/qqpZYP36.

Telephone numbers aren’t the only things that need validating, of course. Let today’s exercise serve as a reminder that in every program you write, you should be careful to validate all input; your users will never thank you for doing that, because they will never notice what they can’t see, but you can be sure they will let you know if you don’t!

About these ads

Pages: 1 2

17 Responses to “Validating Telephone Numbers”

  1. Ajay said

    Your posts are extremely useful!
    But… why LISP

  2. programmingpraxis said

    @geirskjootskift: The task requires you to return the number if it is valid, not just a true or false.

    @Ajay: Thank you. It’s Scheme, not Lisp. And because I want to, and because Scheme gives me options not (easily) available in C or Java or Python; Scheme gives me things like garbage collection and big integers for free, and in what other language could I add list comprehensions and pattern matching to the standard library, and write generators and streams and other little goodies?

  3. kawas44 said

    Clojure solution using regexp like geirskjootskift,
    – use two regexp, deliberately permissive on numbers of seps
    – use regexp groups to extract a normalized phone number.

    (defn is-phone [phone]
      (let [phn-re [#"(?:(\d{3})[-. ]*)?(\d{3})[-. ]*(\d{4})"
                    #"\((\d{3})\)\s*(\d{3})[-. ]*(\d{4})"]
            mtchs (map #(re-matches %1 phone) phn-re)]
        (when-let [m (first (remove nil? mtchs))]
          (apply str (apply concat (next m))))))
    

    Some tests with valid and invalid entries

    (defn test-phone []
      (let [phones ["0123456789" "012 345 6789" "012.345.6789" "012-345-6789"
                    "3456789" "345 6789" "345.6789" "345-6789"
                    "(012)3456789" "(012) 3456789"
                    "(012)345 6789" "(012) 345 6789"
                    "(012)345.6789" "(012) 345.6789"
                    "(012)345-6789" "(012) 345-6789"
                    ;; invalid phones
                    "12-345-6789" "123-45-6789" "123-456-789"
                    "123-45-67890" "123:4567890" "123/456-7890" "123456"
                    "12345678" "123456789" "12345678901" "123-456-7890 x123"]]
        (doseq [phn phones]
          (if-let [m (is-phone phn)]
            (println phn "... ok \t" m)
            (println phn "... KO !!")))))
    
    (test-phone)
    
  4. tomjleo said

    Here is my solution (also using Regex…)

    import re
    import itertools

    def isTelephone(n):
    regex=re.compile(“([0-9]{10})|([0-9]{3}[-]{1}[0-9]{3}[-]{1}[0-9]{4})|([0-9]{3}[.]{1}[0-9]{3}[.]{1}[0-9]{4})|([\(]{1}[0-9]{3}[\)][ ]{1}[0-9]{3}[-]{1}[0-9]{4})|([0-9]{3}[-]{1}[0-9]{4})”)
    chain = itertools.chain(*regex.findall(n))
    if n in list(chain):
    return n
    else:
    return False

    def test():
    test_numbers=[‘1234567890′,
    ‘123-456-7890′,
    ‘123.456.7890’,
    ‘(123) 456-7890′,
    ‘456-7890′,
    ‘123-45-6789′,
    ‘123 4567890′,
    ‘123/456-7890′]
    for num in test_numbers:
    print(isTelephone(num))

    test()

  5. Tom said

    Here is my solution using python and regex

    import re
    import itertools
    
    def isTelephone(n):
        regex=re.compile("([0-9]{10})|([0-9]{3}[-]{1}[0-9]{3}[-]{1}[0-9]{4})|([0-9]{3}[.]{1}[0-9]{3}[.]{1}[0-9]{4})|([\(]{1}[0-9]{3}[\)][ ]{1}[0-9]{3}[-]{1}[0-9]{4})|([0-9]{3}[-]{1}[0-9]{4})")
        chain = itertools.chain(*regex.findall(n)) 
        if n in list(chain):
            return n
        else:
            return False
        
    def test():
        test_numbers=['1234567890',
                      '123-456-7890',
                      '123.456.7890',
                      '(123) 456-7890',
                      '456-7890',
                      '123-45-6789',
                      '123 4567890',
                      '123/456-7890']
        for num in test_numbers:
            print(isTelephone(num))
            
    test()
    
    
  6. Ok.. returned on normalized form. Still don’t see no reason to implement a parser “by hand” :-) I also prefer not putting all patterns in as few regexps as possible. Readability for the win.

    VALID_PHONE_PATTERNS = [
    "^(\d{10})$",
    "^(\d{3})\-(\d{3})\-(\d{4})$",
    "^(\d{3})\.(\d{3})\.(\d{4})$",
    "^\((\d{3})\)\s?(\d{3})\-(\d{4})$",
    "^(\d{3})\-(\d{4})$"]

    def validate_phone_number(phn):

    for pattern in VALID_PHONE_PATTERNS:
    res = compile(pattern).findall(phn)
    if res:
    return "".join(res[0])
    return None

  7. markmain said

    Your rules state that 123/456-7890 is not valid; while it is the least common use, it is frequently used by people. If I were writing code I would use it.

  8. Mike said

    I would normally use a list of regex’s like some of the previous solutions. I used one big regular expression; however, because having two problems is better than having just one problem (see http://regex.info/blog/2006-09-15/247).

    normalize() returns a tuple with the areacode, exchange, and number, or raises a ValueError if the input is not a valid phone number. The areacode is None if one was not provided.

    import re
    
    pn_re = re.compile(r"""(?x)		# verbose mode
    \A					# start of string
    \s*
    (?:
       (?P<lparen>\()?			# optional left paren
       \s*
       (?P<areacode>\d{3})			# areacode is 3 digits
       \s*
       (?P<rparen>(?(lparen)		# if there was a lparen
                     \)			# then match an rparen
                    |(?P<sep1>[-.]?))       # else match an optional separator
       )
       \s*
    )? 					# areacode section is optional
    (?P<exchange>\d{3})			# exchange is three digits
    \s*
    (?P<sep2>(?(sep1)(?P=sep1)|[-.]?))	# if there was a first separator, the
    					# second one must match it
    \s*
    (?P<number>\d{4})			# number is 4 digits
    \s*
    \Z 					# end of string
    """)
    
    def normalize(phone_number):
        mo = pn_re.match(phone_number)
        if mo:
            return mo.group('areacode','exchange','number')
        
        else:
            raise ValueError('Improperly formatted phone number: ' + phone_number )
    
    
    def test():
        valid = ["1234567890", "123-456-7890", "123.456.7890",
                 "(123)456-7890", "(123) 456-7890", "987-6543",
                 "  123 - 456 - 7890  "]
        
        invalid = ["12-345-6789", "123-45-6789", "123-456-789",
              "123-45-67890", "123:4567890", "123/456-7890", "123456",
              "12345678", "123456789", "12345678901", "123-456-7890 x123",
              "123--456--7890", "(123)-456-7890", "12 3 -456- 789 0",
              "123 . 456 - 7890"]
    
        for number in valid + invalid:
            try:
                normal = normalize(number)
    
                assert number in valid, "{} not in valid set".format(number)
                if normal[0] is not None:
                    assert normal == ('123','456','7890'), (number, normal)
                else:
                    assert normal == (None, '987', '6543'), (number, normal)
    
            except ValueError:
                assert number in invalid
    
    
    

    @praxis, I don’t know Scheme, but was wondering how your code would handle the last 4 invalid test cases I added?

  9. Yuushi said
    import re
    
    no_bracket = re.compile('^([0-9]{3})?[-\. ]*[0-9]{3}[-\.]*[0-9]{4}$')
    bracket = re.compile('^\([0-9]{3}\)[-\. ]*[0-9]{3}[-\.]*[0-9]{4}$')
    
    def tel_no_match(number):
        if number[0] == '(':
            z = bracket.match(number)
        else: z = no_bracket.match(number)
    
        if z and z.start() == 0 and z.end() == len(number):
            return number
        return None
    

    Figured it was easier to break it into 2 options, as flagging the case (123-456-7890 is a bit annoying. Also assumes you can mix and match – and ., so 123-456.7890 is ok.

  10. […] is a solution to a programmingpraxis exercise -validating telephone […]

  11. OK, initially missed a couple of requirements, so this is iteration 3 of my solution – over time and over budget ;-) But at least you get the unit tests.

    import unittest
    import re
    
    tele_re = re.compile(r"((\([0-9]{3}\))|([0-9]{3}))?[- \.]?[0-9]{3}[- \.]?[0-9]{4}")
    
    def is_tele(s):
        return tele_re.match(s)
    
    def tele(s):
        if is_tele(s):
            return "".join([c for c in s if '0' <= c <= '9' ])
    
    class TestTelephoneNumbers(unittest.TestCase):
        def setUp(self):
            self.good = [ "1234567890", "123-456-7890", "123.456.7890", "(123)456-7890", "(123) 456-7890", "456-7890" ]
            self.bad = [ "123-45-6789", "123:4567890", "123/456-7890", "(123 456 7890", "123)456 7890", "(456)7890" ]
        def test_good(self):
            for tele_num in self.good:
                self.assertTrue(is_tele(tele_num), tele_num)
        def test_bad(self):
            for tele_num in self.bad:
                self.assertFalse(is_tele(tele_num), tele_num)
        def test_extract(self):
            for tele_num in self.good[0:-1]:
                self.assertEqual(tele(tele_num), self.good[0], msg="%s extract failed" % tele_num)
            self.assertEqual(tele(self.good[-1]), "4567890", msg="%s extract failed" % self.good[-1])
           
    
    if __name__ == '__main__':
        unittest.main()
    
    
  12. One more thing – I’ve just read the suggested solution and liked the trick with working backwards.

  13. Mike said

    My regex-based version above was clearly a tounge-in-cheek solution. Here is a more useful solution.

    It uses a regular expression to group the input string into digits and non-digits. The non-digits are normalized to form a punctuation string, which is compared with a set of valid punctuation strings. Other valid punctuation, e.g., 123/456-7890, can easily be added by adding the punctuation string, e.g., ” /-“, to validpunctuation.

    import re
    
    pattern = re.compile(r"""(?x)     # verbose mode
        \A                            # match beginning of string 
        (\D*)                         #     leading whitespace and punctuation
        (\d{3})?                      # optional areacode
        (\D*)                         #     whitespace and punctuation
        (\d{3})                       # 3-digit exchange
        (\D*)                         #     whitespace and punctuation
        (\d{4})                       # 4-digit number
        \s*                           #     trailing whitespace
        \Z                            # match end of string
        """
        )
    
    validpunctuation = {
        "   ", "()-", "().", " --", " ..",  # with areacode
        "  ", " -", " ."                    # and without
        }
    
    def validate(phonenumber):
        mo = pattern.match(phonenumber)
        
        if mo:
            punctuation = mo.group(1,3,5)
            number = mo.group(2,4,6)
    
            if not mo.group(2):
                punctuation = punctuation[1:]
                number = number[1:]
    
            if ''.join(s.strip() or ' ' for s in punctuation) in validpunctuation:
                return '.'.join(number)
    
        return None
    
    
  14. Diego Giuliani said

    Here is my solution using python:
    Note that I’m still a n00b in Phython so I know there are some semantic bugs, e.g. phones like “(123-456 7890″ or “123)456 7890″ will match

    import re
    
    def match(str):
        pattern = """
            ^           #Begin of the string
            [\(]?       # Possible open parenthesis
            (\d{3})?  # First 3 digits - Optional
            [\)]?       # Possible open parenthesis
            [-|.| ]?    # Possible separators
            (\d{3})    # Second group of 3 digits
            [-|.| ]?    # Possible separators
            (\d{4})    # Last group of digits
            $
        """
        return re.search(pattern,str,re.VERBOSE)
    
    def test():
        phones = [
        "1234567890",
        "123-456-7890",
        "123.456.7890",
        "(123)456-7890",
        "(123) 456-7890",
        "456-7890",
        "123-45-6789",
        "123:4567890",
        "123/456-7890"
        ]
        for phone in phones:
            print match(phone) != None
    
    
  15. Tom said

    @Diego Giuliani you should avoid using str as a variable because str is actually a built-in function see here -> http://docs.python.org/library/functions.html#str

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 634 other followers

%d bloggers like this: