How can you extract a date from a string in a cell in Google Sheets using the powerful
REGEXEXTRACT()
function?
The
REGEXEXTRACT(text, regular_expression)
function has two parameters with the first labelled as
text
being the string operated on and the second labelled as
regular_expression
being the regular expression (
using RE2 syntax
) to extract data from.
Here are a few popular examples of how you can use this powerful function to extract dates from a string in Google Sheets.
Extract D/M/YYYY, DD/MM/YYYY, M/D/YYYY or MM/DD/YYYY
Recently I had a case where a cell contained a long string that had dates in the format
D/M/YYYY
or
DD/MM/YYYY
, for example,
1/6/2022
or
12/10/2022
.
By using the
REGEXEXTRACT()
function I was able to extract the numbers from the string according to what they represented and then inserted them into the
DATE()
function to obtain the desired result of a date cell.
The resulting formula was as follows:
A | |
---|---|
1 | This is a long string with a date at the end 1/10/2022 |
2 |
01/10/2022
=DATE(REGEXEXTRACT(A1,"/(\d{4})"), REGEXEXTRACT(A1, "\s(\d{1,2})/"), REGEXEXTRACT(A1, "/(\d{1,2})/"))
|
REGEXEXTRACT
to create a valid
DATE
value from string
From the above example there are three distinct
REGEXEXTRACT
functions that are extracting from the original string in
A1
the needed value according to the three parameters of the
DATE()
formula.
The first parameter of the
DATE
formula is the year which in the original string is exactly 4 digits long and in this example is found after a forward slash character
/
. Therefore to capture this using the regular expression
"/(\d{4})"
means finding a forward slash
/
start capture
(
where four digits follow
/d{4}
and then close capture
)
.
The second parameter of the
DATE
formula is the
month
which in the original string is found as the first number that has a trailing forward slash
/
. To capture this number from the string the regular expression
"\s(\d{1,2})/"
is used which reading it from left to right means: find a space character
\s
then start capture
(
of a digit containing 1 or 2 numbers
\d{1,2}
then close capture
)
and number must have a trailing slash
/
.
The third parameter of the
DATE
formula is the
day
which in the original string is found between the two forward slashes
/
. Therefore to capture this using the regular expression
"/(\d{1,2})/"
means to find a forward slash
/
and to start capture
(
where there are digits of 1 or 2 in length
\d{1,2}
then close capture
)
and number must have a trailing forward slash after it
/
.
If your date is in the format
DD/MM/YYYY
or
D/M/YYYY
then you could swap the second and third parameters to obtain the same result.
Here is the formula to extract dates in the format
M/D/YYYY
or
MM/DD/YYYY
:
=DATE(REGEXEXTRACT(A1,"/(\d{4})"), REGEXEXTRACT(A1, "\s(\d{1,2})/"), REGEXEXTRACT(A1, "/(\d{1,2})/"))
Here is the formula to extract dates in the format
D/M/YYYY
or
DD/MM/YYYY
:
=DATE(REGEXEXTRACT(A1,"/(\d{4})"), REGEXEXTRACT(A1, "/(\d{1,2})/"), REGEXEXTRACT(A1, "\s(\d{1,2})/"))
Extract YYYYMMDD
How do you extract from a string the date value in the format
YYYYMMDD
?
In the same way
REGEXEXTRACT
was used above to extract the digits from a string the same function can be used to extract the required values from a date in the format
YYYYMMDD
.
Here’s a demonstration of how this works in a Google Sheet:
A | |
---|---|
1 | This is a long string with a date of 20220110 |
2 |
01/10/2022
=DATE(REGEXEXTRACT(A1,"(\d{4})\d{4}"), REGEXEXTRACT(A1, "\d{4}(\d{2})\d{2}"), REGEXEXTRACT(A1, "\d{6}(\d{2})"))
|
REGEXEXTRACT
to create a valid
DATE
value from string where format of the date is
YYYYMMDD
As you can see the same
DATE
and
REGEXEXTRACT
functions are used to grab the necessary numbers from the string.
The first
REGEXEXTRACT
contains the regular expression
(\d{4})\d{4}
which means capture the first four digits where another four digits follow – which means I’m capturing the year part of the
YYYYMMDD
formatted date.
The second
REGEXEXTRACT
contains the regular expression
\d{4}(\d{2})\d{2}
which means find four digits then start capture of two digits that are then followed by two digits. This is how to extract the month from the
YYYYMMDD
format.
The third and last
REGEXEXTRACT
contains the regular expression
\d{6}(\d{2})
which means find 6 digits then capture the last two digits. This is how to extract the day of the month component from the string that is formatted
YYYYMMDD
.
Here is the formula again:
=DATE(REGEXEXTRACT(A1,"(\d{4})\d{4}"), REGEXEXTRACT(A1, "\d{4}(\d{2})\d{2}"), REGEXEXTRACT(A1, "\d{6}(\d{2})"))
Summary
The
REGEXEXTRACT
is a powerful function in Google Sheets that can easily enable you to capture dates from strings.
Whether the format of the date in the string is
YYYYMMDD
or
MM/DD/YYYY
or
D/M/YYYY
as demonstrated above each date format can easily be captured using the
REGEXEXTRACT
formula.