Validating Telephone Numbers

December 13, 2011

When I was a kid, telephones had rotary dials, not push buttons, and exchanges had names; my grandmother was in the Underhill 8 exchange. If you were calling someone in the same exchange as you were, you only had to dial the last four digits of the number. Long distance calling generally involved a human operator.

Modern American telephone numbers have ten digits, segmented as a three-digit area code, a three-digit exchange code, and a four-digit number. Within an area code, you need only dial (the verb hasn’t changed, even though telephones no longer have a dial) the seven-digit exchange code and number; otherwise, you must dial the complete ten-digit number, often with a prefix.

Our exercise today asks you to validate a telephone number, as if written on an input form. Telephone numbers can be written as ten digits, or with dashes, spaces, or dots between the three segments, or with the area code parenthesized; both the area code and any white space between segments are optional. Thus, all of the following are valid telephone numbers: 1234567890, 123-456-7890, 123.456.7890, (123)456-7890, (123) 456-7890 (note the white space following the area code), and 456-7890. The following are not valid telephone numbers: 123-45-6789, 123:4567890, and 123/456-7890.

Your task is to write a phone number validator that follows the rules given above; your function should either return a valid telephone number or an indication that the input is invalid. Be sure to write a proper test suite. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

17 Comments »

17 Responses to “Validating Telephone Numbers”

geirskjootskift said
December 13, 2011 at 9:41 AM
http://pastebin.com/iNVSP0zu

the easy way out :-)
Ajay said
December 13, 2011 at 11:37 AM
Your posts are extremely useful!
But… why LISP
programmingpraxis said
December 13, 2011 at 1:54 PM
@geirskjootskift: The task requires you to return the number if it is valid, not just a true or false.

@Ajay: Thank you. It’s Scheme, not Lisp. And because I want to, and because Scheme gives me options not (easily) available in C or Java or Python; Scheme gives me things like garbage collection and big integers for free, and in what other language could I add list comprehensions and pattern matching to the standard library, and write generators and streams and other little goodies?

kawas44 said

December 13, 2011 at 2:28 PM

Clojure solution using regexp like geirskjootskift,
– use two regexp, deliberately permissive on numbers of seps
– use regexp groups to extract a normalized phone number.

(defn is-phone [phone]
  (let [phn-re [#"(?:(\d{3})[-. ]*)?(\d{3})[-. ]*(\d{4})"
                #"\((\d{3})\)\s*(\d{3})[-. ]*(\d{4})"]
        mtchs (map #(re-matches %1 phone) phn-re)]
    (when-let [m (first (remove nil? mtchs))]
      (apply str (apply concat (next m))))))

Some tests with valid and invalid entries

(defn test-phone []
  (let [phones ["0123456789" "012 345 6789" "012.345.6789" "012-345-6789"
                "3456789" "345 6789" "345.6789" "345-6789"
                "(012)3456789" "(012) 3456789"
                "(012)345 6789" "(012) 345 6789"
                "(012)345.6789" "(012) 345.6789"
                "(012)345-6789" "(012) 345-6789"
                ;; invalid phones
                "12-345-6789" "123-45-6789" "123-456-789"
                "123-45-67890" "123:4567890" "123/456-7890" "123456"
                "12345678" "123456789" "12345678901" "123-456-7890 x123"]]
    (doseq [phn phones]
      (if-let [m (is-phone phn)]
        (println phn "... ok \t" m)
        (println phn "... KO !!")))))

(test-phone)

tomjleo said
December 13, 2011 at 5:19 PM
Here is my solution (also using Regex…)

import re
import itertools

def isTelephone(n):
regex=re.compile(“([0-9]{10})|([0-9]{3}[-]{1}[0-9]{3}[-]{1}[0-9]{4})|([0-9]{3}[.]{1}[0-9]{3}[.]{1}[0-9]{4})|([$]{1}[0-9]{3}[$][ ]{1}[0-9]{3}[-]{1}[0-9]{4})|([0-9]{3}[-]{1}[0-9]{4})”)
chain = itertools.chain(*regex.findall(n))
if n in list(chain):
return n
else:
return False

def test():
test_numbers=[‘1234567890’,
‘123-456-7890’,
‘123.456.7890’,
‘(123) 456-7890’,
‘456-7890’,
‘123-45-6789’,
‘123 4567890’,
‘123/456-7890’]
for num in test_numbers:
print(isTelephone(num))

test()

Tom said

December 13, 2011 at 5:24 PM

Here is my solution using python and regex

import re
import itertools

def isTelephone(n):
    regex=re.compile("([0-9]{10})|([0-9]{3}[-]{1}[0-9]{3}[-]{1}[0-9]{4})|([0-9]{3}[.]{1}[0-9]{3}[.]{1}[0-9]{4})|([\(]{1}[0-9]{3}[\)][ ]{1}[0-9]{3}[-]{1}[0-9]{4})|([0-9]{3}[-]{1}[0-9]{4})")
    chain = itertools.chain(*regex.findall(n)) 
    if n in list(chain):
        return n
    else:
        return False
    
def test():
    test_numbers=['1234567890',
                  '123-456-7890',
                  '123.456.7890',
                  '(123) 456-7890',
                  '456-7890',
                  '123-45-6789',
                  '123 4567890',
                  '123/456-7890']
    for num in test_numbers:
        print(isTelephone(num))
        
test()

geirskjootskift said
December 13, 2011 at 8:14 PM
Ok.. returned on normalized form. Still don’t see no reason to implement a parser “by hand” :-) I also prefer not putting all patterns in as few regexps as possible. Readability for the win.

VALID_PHONE_PATTERNS = [
"^(\d{10})$",
"^(\d{3})\-(\d{3})\-(\d{4})$",
"^(\d{3})\.(\d{3})\.(\d{4})$",
"^$(\d{3})$\s?(\d{3})\-(\d{4})$",
"^(\d{3})\-(\d{4})$"]

def validate_phone_number(phn):

for pattern in VALID_PHONE_PATTERNS:
res = compile(pattern).findall(phn)
if res:
return "".join(res[0])
return None
geirskjootskift said
December 13, 2011 at 8:16 PM
I see that reading : https://programmingpraxis.com/contents/howto-posting-source-code/ didn’t do me no good :-D

http://pastebin.com/7JJKhd05

.. readability again
markmain said
December 13, 2011 at 10:10 PM
Your rules state that 123/456-7890 is not valid; while it is the least common use, it is frequently used by people. If I were writing code I would use it.

Mike said

December 16, 2011 at 2:34 AM

I would normally use a list of regex’s like some of the previous solutions. I used one big regular expression; however, because having two problems is better than having just one problem (see http://regex.info/blog/2006-09-15/247).

normalize() returns a tuple with the areacode, exchange, and number, or raises a ValueError if the input is not a valid phone number. The areacode is None if one was not provided.

import re

pn_re = re.compile(r"""(?x)		# verbose mode
\A					# start of string
\s*
(?:
   (?P<lparen>\()?			# optional left paren
   \s*
   (?P<areacode>\d{3})			# areacode is 3 digits
   \s*
   (?P<rparen>(?(lparen)		# if there was a lparen
                 \)			# then match an rparen
                |(?P<sep1>[-.]?))       # else match an optional separator
   )
   \s*
)? 					# areacode section is optional
(?P<exchange>\d{3})			# exchange is three digits
\s*
(?P<sep2>(?(sep1)(?P=sep1)|[-.]?))	# if there was a first separator, the
					# second one must match it
\s*
(?P<number>\d{4})			# number is 4 digits
\s*
\Z 					# end of string
""")

def normalize(phone_number):
    mo = pn_re.match(phone_number)
    if mo:
        return mo.group('areacode','exchange','number')
    
    else:
        raise ValueError('Improperly formatted phone number: ' + phone_number )


def test():
    valid = ["1234567890", "123-456-7890", "123.456.7890",
             "(123)456-7890", "(123) 456-7890", "987-6543",
             "  123 - 456 - 7890  "]
    
    invalid = ["12-345-6789", "123-45-6789", "123-456-789",
          "123-45-67890", "123:4567890", "123/456-7890", "123456",
          "12345678", "123456789", "12345678901", "123-456-7890 x123",
          "123--456--7890", "(123)-456-7890", "12 3 -456- 789 0",
          "123 . 456 - 7890"]

    for number in valid + invalid:
        try:
            normal = normalize(number)

            assert number in valid, "{} not in valid set".format(number)
            if normal[0] is not None:
                assert normal == ('123','456','7890'), (number, normal)
            else:
                assert normal == (None, '987', '6543'), (number, normal)

        except ValueError:
            assert number in invalid

@praxis, I don’t know Scheme, but was wondering how your code would handle the last 4 invalid test cases I added?

Yuushi said

December 16, 2011 at 6:23 AM

import re

no_bracket = re.compile('^([0-9]{3})?[-\. ]*[0-9]{3}[-\.]*[0-9]{4}$')
bracket = re.compile('^\([0-9]{3}\)[-\. ]*[0-9]{3}[-\.]*[0-9]{4}$')

def tel_no_match(number):
    if number[0] == '(':
        z = bracket.match(number)
    else: z = no_bracket.match(number)

    if z and z.start() == 0 and z.end() == len(number):
        return number
    return None

Figured it was easier to break it into 2 options, as flagging the case (123-456-7890 is a bit annoying. Also assumes you can mix and match – and ., so 123-456.7890 is ok.

Validating telephone numbers – Python « From C/C++ to Python said
December 17, 2011 at 9:54 PM
[…] is a solution to a programmingpraxis exercise -validating telephone […]

Tomasz Kwiatkowski said

December 17, 2011 at 10:31 PM

OK, initially missed a couple of requirements, so this is iteration 3 of my solution – over time and over budget ;-) But at least you get the unit tests.

import unittest
import re

tele_re = re.compile(r"((\([0-9]{3}\))|([0-9]{3}))?[- \.]?[0-9]{3}[- \.]?[0-9]{4}")

def is_tele(s):
    return tele_re.match(s)

def tele(s):
    if is_tele(s):
        return "".join([c for c in s if '0' <= c <= '9' ])

class TestTelephoneNumbers(unittest.TestCase):
    def setUp(self):
        self.good = [ "1234567890", "123-456-7890", "123.456.7890", "(123)456-7890", "(123) 456-7890", "456-7890" ]
        self.bad = [ "123-45-6789", "123:4567890", "123/456-7890", "(123 456 7890", "123)456 7890", "(456)7890" ]
    def test_good(self):
        for tele_num in self.good:
            self.assertTrue(is_tele(tele_num), tele_num)
    def test_bad(self):
        for tele_num in self.bad:
            self.assertFalse(is_tele(tele_num), tele_num)
    def test_extract(self):
        for tele_num in self.good[0:-1]:
            self.assertEqual(tele(tele_num), self.good[0], msg="%s extract failed" % tele_num)
        self.assertEqual(tele(self.good[-1]), "4567890", msg="%s extract failed" % self.good[-1])
       

if __name__ == '__main__':
    unittest.main()

Tomasz Kwiatkowski said
December 17, 2011 at 10:47 PM
One more thing – I’ve just read the suggested solution and liked the trick with working backwards.

Mike said

December 22, 2011 at 1:03 AM

My regex-based version above was clearly a tounge-in-cheek solution. Here is a more useful solution.

It uses a regular expression to group the input string into digits and non-digits. The non-digits are normalized to form a punctuation string, which is compared with a set of valid punctuation strings. Other valid punctuation, e.g., 123/456-7890, can easily be added by adding the punctuation string, e.g., ” /-“, to validpunctuation.

import re

pattern = re.compile(r"""(?x)     # verbose mode
    \A                            # match beginning of string 
    (\D*)                         #     leading whitespace and punctuation
    (\d{3})?                      # optional areacode
    (\D*)                         #     whitespace and punctuation
    (\d{3})                       # 3-digit exchange
    (\D*)                         #     whitespace and punctuation
    (\d{4})                       # 4-digit number
    \s*                           #     trailing whitespace
    \Z                            # match end of string
    """
    )

validpunctuation = {
    "   ", "()-", "().", " --", " ..",  # with areacode
    "  ", " -", " ."                    # and without
    }

def validate(phonenumber):
    mo = pattern.match(phonenumber)
    
    if mo:
        punctuation = mo.group(1,3,5)
        number = mo.group(2,4,6)

        if not mo.group(2):
            punctuation = punctuation[1:]
            number = number[1:]

        if ''.join(s.strip() or ' ' for s in punctuation) in validpunctuation:
            return '.'.join(number)

    return None

Diego Giuliani said

December 27, 2011 at 1:12 PM

Here is my solution using python:
Note that I’m still a n00b in Phython so I know there are some semantic bugs, e.g. phones like “(123-456 7890” or “123)456 7890” will match

import re

def match(str):
    pattern = """
        ^           #Begin of the string
        [\(]?       # Possible open parenthesis
        (\d{3})?  # First 3 digits - Optional
        [\)]?       # Possible open parenthesis
        [-|.| ]?    # Possible separators
        (\d{3})    # Second group of 3 digits
        [-|.| ]?    # Possible separators
        (\d{4})    # Last group of digits
        $
    """
    return re.search(pattern,str,re.VERBOSE)

def test():
    phones = [
    "1234567890",
    "123-456-7890",
    "123.456.7890",
    "(123)456-7890",
    "(123) 456-7890",
    "456-7890",
    "123-45-6789",
    "123:4567890",
    "123/456-7890"
    ]
    for phone in phones:
        print match(phone) != None

Tom said
December 27, 2011 at 3:32 PM
@Diego Giuliani you should avoid using str as a variable because str is actually a built-in function see here -> http://docs.python.org/library/functions.html#str

Programming Praxis