regex_extract function

The regex_extract function returns a list of strings that match a regular expression from the source string.

Syntax

regex_extract(
     str      lvarchar,
     re       lvarchar,
     limit    integer DEFAULT 0,
     copts   integer DEFAULT 1)   
returns lvarchar
regex_extract(
     str      clob,
     re       lvarchar,
     limit    integer DEFAULT 0,
     copts_string    lvarchar)
returns lvarchar

Parameters

str
The string to search. Can be of type CHAR, NCHAR, VARCHAR, NVARCHAR, LVARCHAR, or CLOB. A null value is treated as an empty string.
re
The regular expression. Can be of type CHAR, NCHAR, VARCHAR, NVARCHAR, or LVARCHAR. Cannot be null.
limit (Optional)
0 = Default. All matches are returned.
Positive integer = The maximum number of matches to return.
copts (Optional)
The type of regex search:
  • 0 = Basic regex
  • 1 = Default. Extended POSIX regex
  • 2 = Basic regex and ignore case
  • 3 = Extended POSIX regex and ignore case
copts_string (Optional)
The type of regex search:
  • basic = Basic regex
  • extended = Default. Extended POSIX regex
  • basic,icase = Basic regex and ignore case
  • extended,icase = Extended POSIX regex and ignore case
  • basic,rtrim = Basic regex with rtrim
  • extended,rtrim = Extended POSIX regex with rtrim
  • basic,icase,rtrim = Basic regex and ignore case with rtrim
  • extended,icase,rtrim = Extended POSIX regex and ignore case with rtrim

Description

Use the regex_extract function iteratively return each substring that matches a regular expression. You can limit the number of substrings returned.

Returns

A set of text values that match the input regular expression pattern.

No rows found = A 0-length match, for example, a newline character.

An exception = An error occurred.

Example: Find a pattern within words

In this example, you want to find the patterns "would" and "wood":
How much wood could a
woodchuck chuck if a woodchuck could chuck wood?
A woodchuck could chuck as much wood as a woodchuck would chuck
if a woodchuck could chuck wood.
The following regular expression matches both the word "wood" and the word "would":
wo[ou]l?d
When you use this regular expression, the regex_extract function shows that the two words occur ten times in the string, but does not provide the entire word in which the pattern is found:
execute function
regex_extract(
     'How much wood could a woodchuck chuck if a woodchuck could chuck wood?
     A woodchuck could chuck as much wood as a woodchuck would chuck
     if a woodchuck could chuck wood.',
     'wo[ou]l?d'
);
(expression) wood
(expression) wood
(expression) wood
(expression) wood
(expression) wood
(expression) wood
(expression) wood
(expression) would
(expression) wood
(expression) wood
10 row(s) retrieved.

Example: Return the pattern plus the rest of the word

You can expand the regular expression to include the entire word. Start by specifying that more lowercase characters can follow the primary subexpression:
wo[ou]l?d[a-z]*
Next, add that the word ends with a space or a punctuation character. For completeness, you can specify more punctuation characters than the ones that occur in the text.
Note: If a hyphen appears as the first character in a character class, it means a literal hyphen, not a range of values.
wo[ou]l?d[a-z]*[- .?!:;]
When you run the regex_extract function with this expression, the function returns the whole words in which the pattern occurs:
execute function
regex_extract(
     'How much wood could a woodchuck chuck if a woodchuck could chuck wood?
     A woodchuck could chuck as much wood as a woodchuck would chuck
     if a woodchuck could chuck wood.',
     'wo[ou]l?d[a-z]*[- .?!:;]'
);
(expression) wood
(expression) woodchuck
(expression) woodchuck
(expression) wood?
(expression) woodchuck
(expression) wood
(expression) woodchuck
(expression) would
(expression) woodchuck
(expression) wood.
10 row(s) retrieved.
In the following example, you limit the results to the first two:
execute function
regex_extract(
     'How much wood could a woodchuck chuck if a woodchuck could chuck wood?
     A woodchuck could chuck as much wood as a woodchuck would chuck
     if a woodchuck could chuck wood.',
     'wo[ou]l?d[a-z]*[- .?!:;]',
     2
);
(expression) wood
(expression) woodchuck
2 row(s) retrieved.

Example: Return the pattern plus the beginning of the word

You can include the beginning of the word in the regular expression. The beginning of a word can be a space or the beginning of the line, ( |^), followed by upper or lowercase letters:
( |^)[A-Za-z]*
In this example, you want to find all instances of the string "tter" in the following text:
Betty bought some bitter
butter and it made her batter bitter,
so Betty bought some better
butter to make her bitter batter better.
The following statement returns the pattern plus the beginnings of the words:
execute function
regex_extract(
     'Betty bought some bitter butter and it made her batter bitter,
     so Betty bought some better butter to make her bitter batter better.',
     '( |^)[A-Za-z]*tter'
);
(expression) bitter
(expression) butter
(expression) batter
(expression) bitter
(expression) better
(expression) butter
(expression) bitter
(expression) batter
(expression) better
9 row(s) retrieved.

Example: A match with 0 length

If the regular expression results in a 0-length match, the query returns the message "No rows found." For example, although the regex_match function returns t for a match on a begin-line character ("^"), the regex_extract function returns no rows because a search of "^" matches a string that has a length of 0:
execute function regex_match('Hello world', '^');
(expression) t
 1 row(s) retrieved.
execute function regex_extract('Hello world', '^');
 No rows found.