Skip to Content

How To Clean And Format Phone Numbers In Google Sheets

How To Clean And Format Phone Numbers In Google Sheets

If you’re working with data in Google Sheets you’ll soon come across a time when you will need to clean and format phone number data entries.

To clean phone numbers in Google Sheets using the REGEXEXTRACT() function extract all the different fields according to the phone number entries and then combine them all into the desired format.

Here’s an example walking step by step through the process of cleaning the data set first, before then moving on to creating the desired phone number format.

Survey Current Phone Number Formats

Suppose I had the following USA phone number data set:

A
1+1 (555) 123 4567
2(555) 123 4567
3555-123-4567
4123.4567
A series of US phone numbers all of different formats

As you can see from the example data set the data isn’t clean. Anyone and everyone who was entering phone number data into this field was entering the data according to their own personal taste.

To clean this data set up I’m going to use the REGEXEXTRACT() formula to help extract the phone groups that should have been in each cell, the groups are as follows:

  • Plus symbol and a one to three-digit country code (1 being USA)
  • Area code of three digits, usually in brackets
  • Three digits representing the prefix number
  • Four digits representing the subscriber number

With this format in mind, you can represent the above as a regular expression like so:

\+?\d{1,3}?\D?\(?\d{3}?\)?\D?\d{3}\D?\d{4}

Let’s break the regular expression apart so that you can understand how each expression corresponds to the phone number format:

Regular ExpressionDefinition
\+?The initial + sign denoting a country code. As the + sign is a special character in regular expressions it is escaped \ and as some phone numbers might not have this character I’ve appended the optional character ?
\d{1,3}?This expression looks for a digit between 1 and 3 digits long and as all cells may not have the country code digits this expression has also been appended with the optional character ?
\D?This expression looks for a non-digit character which can be anything else besides a number. Again, as some fields will not have the country code you will need to make it optional by appending the ?
\(?Next, is another expression which is escaped \ and therefore seeks the exact character ( and is appended with the optional character ? as some fields will have the area code but are not wrapped in parentheses.
\d{3}?The area code is exactly three digits long and is optional. Some area codes in other countries, like Australia, would have area codes of only two digits, therefore, you would amend this to be \d{2}?
\)?Closing the area code with the exact parentheses and making it optional.
\D?Optional non-digit character.
\d{3}Searching for the prefix three-digit number and this expression isn’t optional. This is the bare minimum needed with a phone number.
\D?Optional non-digit character between the prefix and subscriber number
\d{4}Finishing with the four-digit subscriber number which is a needed expression.
Explaining the regular expression phone number format

Depending upon your phone number data set you may need to remove sections to match your structure. For example, if your phone numbers don’t contain the country code then you will not need the expression \+?\d{1,3}?\D? or if you don’t have parentheses around your area code then you can remove the \(? and \)? sections from the regular expression.

Without a country code or parentheses around an area code your regular expression could look something as simple as this instead:

\d{3}?\D?\d{3}\D?\d{4}

Capture Country, Area, Prefix And Subscriber Sections From Phone Number

The final aspect needed before inserting the above regular expression into the REGEXEXTRACT() formula is to set what you want to capture from the expression.

In my example, I want to capture all four sections from the phone number fields: the country code, area code, prefix and subscriber numbers. To do this I need to set the captures in the regular expression and the way this is done is by wrapping the expressions in parentheses.

Using my original regular expression here’s how capturing the country code, area code, prefix and subscriber sections would look:

\+?(\d{1,3})?\D?\(?(\d{3})?\)?\D?(\d{3})\D?(\d{4})

Notice the newly inserted parentheses (highlighted in bold).

When this regular expression is inserted into your Google Sheet phone number data, in an adjacent column, with the REGEXEXTRACT(A1, "\+?(\d{1,3})?\D?(?(\d{3})?)?\D?(\d{3})\D?(\d{4})) formula it produces the following result:

ABCDE
1+1 (555) 123 456715551234567
2(555) 123 45675551234567
3555-123-45675551234567
4123.45671234567
=REGEXEXTRACT(A1,"\+?(\d{1,3})?\D?(?(\d{3})?)?\D?(\d{3})\D?(\d{4})") inserted into cell B1 and copied down

As you can see the regular expression extracts each section from the phone numbers in column A and inserts their section into the 4 columns alongside each column representing one of the captured sections.

Concatenate Phone Number Into Custom String

To translate this into your custom phone number format it would now be a case of adding another column and where a value doesn’t exist in a column to replace it with what should have been inserted.

For example, if these numbers are all USA phone numbers then for column B we could insert the string "+1" as a substitute (as 1 represents the country code for USA); if column B is meant to be a local area code of 555 then this would be the substituted value should that column be blank.

Therefore, wrapping this all up into a formula and applying spacing between each section you could insert the following formula in column F:

=CONCATENATE("+",IF(LEN(B1),B1,"1")," (",IF(LEN(C1),C1,"555"),") ",D1," ",E1)

Here is the result in our working spreadsheet:

ABCDEF
1+1 (555) 123 4567+15551234567+1 (555) 123 4567
2(555) 123 45675551234567+1 (555) 123 4567
3555-123-45675551234567+1 (555) 123 4567
4123.45671234567+1 (555) 123 4567
Using CONCATENATE to construct your own phone number format

As you can see the CONCATENATE() function has allowed you to design your own custom phone number format and the result is a neatly designed string from the “dirty data” of phone numbers in column A.

Phone Number As Custom Number Format

An alternative to constructing your own custom phone number string using the CONCATENATE() formula from the result of the REGEXEXTRACT() function is to create a phone number as a number and to apply a custom number format for that column.

This may prove handy should the data be issued to others and they prefer different formats to phone numbers.

First step with this approach is to apply the same concatenation technique but to convert it to a number using the VALUE() formula. This would look something like this in column F instead:

=VALUE(CONCATENATE(IF(LEN(B1),B1,"1"),IF(LEN(C1),C1,"555"),D1,E1))

Notice the formula has a slight difference from above: the removal of the "+" country code prefix, spacing and parentheses. Removing these elements will help make it easier for the VALUE() function to translate the string to a number, if you do get an error check these elements have been removed from your CONCATENATE() function.

The result of the above formula would now look something like this in your spreadsheet:

ABCDEF
1+1 (555) 123 4567+1555123456715551234567
2(555) 123 4567555123456715551234567
3555-123-4567555123456715551234567
4123.4567123456715551234567
Changing the final phone number format into a number using VALUE() and CONCATENATE()

As you can see the result of this combination of formulas produces one large number.

Change Phone Number Format Using Format Menu

To change a number into a phone number format you can use the Format menu item and navigate to Number and then Custom Number Format.

The modal window that pops up will then need a format for how you would like to display the number. If you stick with the common format of +# (###) ### ### then you would type into the input box next to the Apply button that format, such that it looks like this:

Set your own custom phone number format

The # symbol in the Custom Number Format field represents a number.

Once you apply this to your column with phone numbers you will see column F change to this:

ABCDEF
1+1 (555) 123 4567+15551234567+1 (555) 123 4567
2(555) 123 45675551234567+1 (555) 123 4567
3555-123-45675551234567+1 (555) 123 4567
4123.45671234567+1 (555) 123 4567
Applying the custom number format to the numbers in column F to make them phone number format friendly

Then should a user prefer a different format they can apply their own Custom Number Format field accordingly.

For example, if a user prefers square brackets and a hyphen between the prefix and subscriber numbers then their format will look something like this:

+# [###] ###-####

Summary

Cleaning phone numbers in a spreadsheet, like Google Sheets, is made easier with the handy REGEXEXTRACT() function. This function enables users to apply one regular expression to capture all the phone number entries and then by using CONCATENATE() can then change these captures into a constant phone number format.

Check our other post on this topic where I explore how you can use Python and regular expressions to check phone numbers too.