Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
Databricks SQL
Databricks Runtime 11.3 LTS and above
Returns the first substring in str that matches regexp.
Syntax
regexp_substr( str, regexp )
Arguments
str: ASTRINGexpression to be matched.regexp: ASTRINGexpression with a matching pattern.
Returns
A STRING.
regexp_substr returns the entire substring that regexp matches, not an individual capture group.
To return a specific capture group, use regexp_extract.
Only the first match is considered. To return every match, use regexp_extract_all.
The regexp string must be a regular expression. See Regular expressions for the supported syntax.
When using literals, use raw-literal (r prefix) to avoid escape character pre-processing.
If str or regexp is NULL, or the pattern is not found, the result is NULL.
Common error conditions
Examples
Match one of several alternatives
The pattern Ste(v\|ph)en matches either Steven or Stephen. The function returns the first match.
> SELECT regexp_substr('Steven Jones and Stephen Smith are the best players', r'Ste(v|ph)en');
Steven
Extract the first number from a string
Use a raw literal so that \d is passed to the regex engine without doubling the backslash.
> SELECT regexp_substr('item 42 costs $19', r'\d+');
42
Match case-insensitively
Use the (?i) inline flag to ignore case.
> SELECT regexp_substr('Error: disk full', r'(?i)error');
Error
Extract an email address
> SELECT regexp_substr('Contact alice@example.com for details', r'\w+@\w+\.\w+');
alice@example.com
Extract the first hashtag
regexp_substr returns only the first match. To return every match, use regexp_extract_all.
> SELECT regexp_substr('Posts tagged #data #ai #sql', r'#\w+');
#data
No match returns NULL
Unlike regexp_extract, which returns an empty string, regexp_substr returns NULL when the pattern is not found.
> SELECT regexp_substr('Mary had a little lamb', r'Ste(v|ph)en');
NULL
NULL input returns NULL
> SELECT regexp_substr(NULL, r'Ste(v|ph)en');
NULL
> SELECT regexp_substr('Mary had a little lamb', NULL);
NULL
Invalid regex pattern
> SELECT regexp_substr('abc', '[invalid');
Error: INVALID_PARAMETER_VALUE.PATTERN