regexp_substr function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 11.3 LTS and above

Returns the first substring in str that matches regexp.

Syntax

regexp_substr( str, regexp )

Arguments

  • str: A STRING expression to be matched.
  • regexp: A STRING expression with a matching pattern.

Returns

A STRING.

regexp_substr returns the entire substring that regexp matches, not an individual capture group. To return a specific capture group, use regexp_extract. Only the first match is considered. To return every match, use regexp_extract_all.

The regexp string must be a regular expression. See Regular expressions for the supported syntax.

When using literals, use raw-literal (r prefix) to avoid escape character pre-processing.

If str or regexp is NULL, or the pattern is not found, the result is NULL.

Common error conditions

Examples

Match one of several alternatives

The pattern Ste(v\|ph)en matches either Steven or Stephen. The function returns the first match.

> SELECT regexp_substr('Steven Jones and Stephen Smith are the best players', r'Ste(v|ph)en');
 Steven

Extract the first number from a string

Use a raw literal so that \d is passed to the regex engine without doubling the backslash.

> SELECT regexp_substr('item 42 costs $19', r'\d+');
 42

Match case-insensitively

Use the (?i) inline flag to ignore case.

> SELECT regexp_substr('Error: disk full', r'(?i)error');
 Error

Extract an email address

> SELECT regexp_substr('Contact alice@example.com for details', r'\w+@\w+\.\w+');
 alice@example.com

Extract the first hashtag

regexp_substr returns only the first match. To return every match, use regexp_extract_all.

> SELECT regexp_substr('Posts tagged #data #ai #sql', r'#\w+');
 #data

No match returns NULL

Unlike regexp_extract, which returns an empty string, regexp_substr returns NULL when the pattern is not found.

> SELECT regexp_substr('Mary had a little lamb', r'Ste(v|ph)en');
 NULL

NULL input returns NULL

> SELECT regexp_substr(NULL, r'Ste(v|ph)en');
 NULL

> SELECT regexp_substr('Mary had a little lamb', NULL);
 NULL

Invalid regex pattern

> SELECT regexp_substr('abc', '[invalid');
  Error: INVALID_PARAMETER_VALUE.PATTERN