How can you use the Python Regex library to check if a string represents a phone number?
To check if a string matches a specific pattern use the Regex library’s match or exec methods.
Before writing your Regex pattern inspect the variants for the phone number field to see whether you’re Regex pattern will match.
For example, if upon your inspection you find the following variants:
+610412345678
0412 345 678 - Mum
0412-345-678
61412.345.678
(02) 345 6789
023456789 Work
123 4567 Home
By noting all the different variants you should be able to write your Regex pattern to capture all these types that are valid phone numbers.
During your inspection you will want to look for mobile/cell phone numbers, international phone numbers, interstate phone numbers and local phone numbers as each type may have its own unique set of variants.
Capture Mobile/Cell Phone Numbers
I’ll start by focussing on the mobile or cell numbers first.
Mobile numbers have 10 digits, but some mobile numbers are prefixed with their international location.
Here are some examples of valid mobile phone numbers in Australia:
0412 345 678 +61412345678 +61 0412-345-678 0412345678
To represent this in a regular expression you want to break up all the common elements in the phone number strings.
Match A Number With Certain Number Of Digits
When dealing with phone numbers the most frequent regex flag
\d
is used to help capture digits in a string.
Dealing with the above list of phone numbers you can represent this as a regex expression using the digit flag
\d
and non-digit flag
\D
along with the number of characters expected. For example, with the mobile phone numbers listed above we have 4 zones: Australian international number, first zone of 3 or 4 numbers, second zone of 3 numbers, and third zone of 3 numbers.
To represent this using a regular expression it would look something like this:
(?:\+\d{2})?\d{3,4}\D?\d{3}\D?\d{3}
Breaking this expression up it reads as follows:
Expression | Detail |
---|---|
(?:
|
Start non-capture group |
\+
|
Find exact character
"+"
. The
"+"
is a special character and therefore needs to be escaped
\
|
\d{2}
|
Find two digits |
)?
|
End non-capture group and set as optional by appending the character
?
|
\d{3,4}
|
Find three to four digits |
\D?
|
Find non-digit character and mark as optional |
\d{3}
|
Find three digits |
\D?
|
Find non-digit character and mark as optional |
\d{3}
|
Find three digits |
Running this regex through each of the phone numbers above produces the following results:
>>> import re
>>> rgx_phone = re.compile(r"(?:\+\d{2})?\d{3,4}\D?\d{3}\D?\d{3}")
>>> phone_list = ["0412 345 678", "+61412345678", "+61 0412-345-678", "0412345678"]
>>> [x for x in phone_list if re.findall(rgx_phone, x)]
['0412 345 678', '+61412345678', '+61 0412-345-678', '0412345678']
As you can see from the above Python REPL code each of our sample phone numbers successfully meets my regex mobile phone number pattern .
Capture Land Line Phone Numbers
In the same way mobile phone numbers were captured above the process for creating your regex pattern capturing normal land line phone numbers should be applied.
Grab a list of valid phone numbers and look at how they may have been inserted, here’s a sample:
(02) 1234 5678
+612.1234.5678
0212345678
1234-5678
Mapping this to a regex pattern could be captured by something like this:
(?:\+?\(?\d{2,3}?\)?\D?)?\d{4}\D?\d{4}
Here’s what this pattern means when broken up:
Expression | Detail |
---|---|
(?:
|
Start non-capture group |
\+
?
|
Find exact character
"+"
and mark as optional by appending the character
?
|
\(?
|
Find character
(
as this character is special it is escaped with
\
and as it is optional has the character
?
appended.
|
\d{2,3}?
|
Find two or three digits and mark as optional |
\)?
|
Find character
)
and as this character is special escape it with
\
and mark as optional with
?
|
\D?
|
Find non-digit character and mark as optional. |
)?
|
End non-capture group and set as optional by appending the character
?
|
\d{4}
|
Find four digits |
\D?
|
Find non-digit character and mark as optional |
\d{4}
|
Find four digits |
And here’s how the pattern is used with our sample phone numbers above:
>>> import re
>>> rgx_phone = re.compile("(?:\+?\(?\d{2,3}?\)?\D?)?\d{4}\D?\d{4}")
>>> phone_list = ["(02) 1234 5678", "+612.1234.5678", "0212345678", "1234-5678"]
>>> [x for x in phone_list if re.findall(rgx_phone, x)]
['(02) 1234 5678', '+612.1234.5678', '0212345678', '1234-5678']
As you can see from the above code the valid phone numbers do match the regex pattern above.
Regex and Phone Numbers in Python: Summary
To create a regular expression capturing phone numbers look through a sample set of phone numbers in your data set and match as best as possible the majority of phone numbers by using the
\d{range}
flag.
Check out our other post on how you can clean and format numbers using Google Sheets from the concepts taught here.