REGEXP_SUBSTR() function is an advanced version of the
SUBSTR()function that allows you to search for substrings based on a regular expression. Instead of returning the position of the substring, it returns a portion of the source string that matches the regular expression.
The following illustrates the syntax of the Oracle
REGEXP_SUBSTR(source_string, pattern [, start_position [, occurrence [, match_parameter [, subexpr ] ] ] ] )
REGEXP_SUBSTR() function accepts 6 arguments:
is a string to be searched for.
is the regular expression pattern that is used to search for in the source string.
is positive integer that indicates the starting position in the source string where the search begins.
start_position argument is optional. Its default value is 1. Therefore, if you don’t specify it explicitly, the
REGEXP_SUBSTR() function will start searching at the beginning of the source string.
is a positive integer that specifies which occurrence of the search pattern that the
REGEXP_SUBSTR() function should search for.
occurrence argument is also optional and it defaults to 1, meaning that the
REGEXP_SUBSTR() function should search for the first occurrence of the pattern in the source string.
is a literal string that determines the default matching behavior for the
You can use one or more following values for the
- ‘i’ indicates case-insensitive matching
- ‘c’ indicates case-sensitive matching.
- ‘n’ allows the period (.) character to match the newline character. If you don’t explicitly specify this parameter, the
REGEXP_SUBSTR()function will not use the period to match the newline character.
- ‘m’ treats the source string as a multiline string.
match_parameter argument is optional, therefore, if you omit it, the
REGEXP_SUBSTR() function will behave as follows:
- Case sensitivity matching is determined by
- The period (.) does not match the newline character.
- The source string is treated as a single line.
is a positive integer whose value from 0 to 9 that indicates which sub-expression in the regular expression is the target.
REGEXP_SUBSTR() function returns a portion of source string that matches the regular expression.
The data type of the result string can be either VARCHAR2 or CLOB and its character set is the same as the source string.
Suppose, we have the following string:
'This is a regexp_substr demo'
If you want to get the fourth word of the above string, you use the
REGEXP_SUBSTR() function as follows:
SELECT regexp_substr( 'This is a regexp_substr demo', '[[:alpha:]]+', 1, 4 ) the_4th_word FROM dual;
The result is
In this example:
'[[:alpha:]]+'is the regular expression pattern that matches any word.
- 1 instructs the function to start searching for the beginning of the string.
- 4 specifies the fourth occurrence.
See the following page more information on regular expression support on Oracle.
To return every word of the source string, you can use the
CONNECT BY LEVEL clause where the
LEVEL keyword is used as the fourth argument as follows:
SELECT regexp_substr( 'This is a regexp_substr demo', '[[:alpha:]]+', 1, LEVEL ) regexp_substr FROM dual CONNECT BY LEVEL <= regexp_count( 'This is a regexp_substr demo', ' ' ) + 1;
Note that the total of match occurrences is calculated by counting the number of spaces in the source string plus 1 by using the
See the following
products table in the sample database:
The following statement selects all MotherBoards from the
SELECT product_id, product_name, description FROM products WHERE category_id = 4 ORDER BY product_name ;
Suppose, you want to get the maximum RAM that a motherboard can support, you can extract this information from the description column using the following regular expression:
This regular expression means match one or more numbers (\d+) followed by either GB or TB string (GB|TB).
The following query illustrates the idea:
SELECT product_id, product_name, description, REGEXP_SUBSTR( description, '\d+(GB|TB)' ) max_ram FROM products WHERE category_id = 4;
And here is the output:
In this tutorial, you have learned how to use the Oracle
REGEXP_SUBSTR() to extract a portion of string based on a regular expression.