Oracle REGEXP_SUBSTR

The Oracle REGEXP_SUBSTR() function is an advanced version of the SUBSTR()function that allows you to search for substrings based on a regular expression. Instead of returning the position of the substring, it returns a portion of the source string that matches the regular expression.

Syntax

The following illustrates the syntax of the Oracle REGEXP_SUBSTR() function:

REGEXP_SUBSTR(source_string, pattern
              [, start_position
                 [, occurrence
                    [, match_parameter
                       [, subexpr
                       ]
                    ]
                 ]
              ]
             )
Code language: SQL (Structured Query Language) (sql)

Arguments

The Oracle REGEXP_SUBSTR() function accepts 6 arguments:

1) source_string

is a string to be searched for.

2) pattern

is the regular expression pattern that is used to search for in the source string.

3) start_position

is positive integer that indicates the starting position in the source string where the search begins.

The start_position argument is optional. Its default value is 1. Therefore, if you don’t specify it explicitly, the REGEXP_SUBSTR() function will start searching at the beginning of the source string.

4) occurrence

is a positive integer that specifies which occurrence of the search pattern that the REGEXP_SUBSTR() function should search for.

The occurrence argument is also optional and it defaults to 1, meaning that the REGEXP_SUBSTR() function should search for the first occurrence of the pattern in the source string.

5) match_parameter

is a literal string that determines the default matching behavior for the REGEXP_SUBSTR() function.

You can use one or more following values for the match_parameter argument:

  • ‘i’ indicates case-insensitive matching
  • ‘c’ indicates case-sensitive matching.
  • ‘n’ allows the period (.) character to match the newline character. If you don’t explicitly specify this parameter, the REGEXP_SUBSTR() function will not use the period to match the newline character.
  • ‘m’ treats the source string as a multiline string.

Because the match_parameter argument is optional, therefore, if you omit it, the REGEXP_SUBSTR() function will behave as follows:

  • Case sensitivity matching is determined by NLS_SORT parameter.
  • The period (.) does not match the newline character.
  • The source string is treated as a single line.

6) subexpr

is a positive integer whose value from 0 to 9 that indicates which sub-expression in the regular expression is the target.

Return Value

The REGEXP_SUBSTR() function returns a portion of source string that matches the regular expression.

The data type of the result string can be either VARCHAR2 or CLOB and its character set is the same as the source string.

Examples

Suppose, we have the following string:

'This is a regexp_substr demo'
Code language: SQL (Structured Query Language) (sql)

If you want to get the fourth word of the above string, you use the REGEXP_SUBSTR() function as follows:

SELECT
  regexp_substr( 'This is a regexp_substr demo', '[[:alpha:]]+', 1, 4
  ) the_4th_word
FROM
  dual;
Code language: SQL (Structured Query Language) (sql)

The result is

regexp_substr
Code language: SQL (Structured Query Language) (sql)

In this example:

  • '[[:alpha:]]+' is the regular expression pattern that matches any word.
  • 1 instructs the function to start searching for the beginning of the string.
  • 4 specifies the fourth occurrence.

See the following page more information on regular expression support on Oracle.

To return every word of the source string, you can use the CONNECT BY LEVEL clause where the LEVEL keyword is used as the fourth argument as follows:

SELECT
  regexp_substr( 'This is a regexp_substr demo', '[[:alpha:]]+', 1,  LEVEL ) regexp_substr
FROM
  dual
 CONNECT BY LEVEL <= regexp_count( 'This is a regexp_substr demo',  ' ' ) + 1;
Code language: SQL (Structured Query Language) (sql)

Note that the total of match occurrences is calculated by counting the number of spaces in the source string plus 1 by using the REGEXP_COUNT() function.

See the following products table in the sample database:

products table

The following statement selects all MotherBoards from the products table:

SELECT
  product_id, 
  product_name, 
  description
FROM
  products
WHERE
  category_id = 4
ORDER BY
  product_name ;Code language: SQL (Structured Query Language) (sql)
Oracle REGEXP_SUBSTR Function Example

Suppose, you want to get the maximum RAM that a motherboard can support, you can extract this information from the description column using the following regular expression:

'\d+(GB|TB)'
Code language: SQL (Structured Query Language) (sql)

This regular expression means to match one or more numbers (\d+) followed by either GB or TB string (GB|TB).

The following query illustrates the idea:

SELECT
  product_id, 
  product_name, 
  description, 
  REGEXP_SUBSTR( description,  '\d+(GB|TB)' ) max_ram
FROM
  products
WHERE
  category_id = 4;
Code language: SQL (Structured Query Language) (sql)

Here is the output:

Oracle REGEXP_SUBSTR Function - extract RAM example

In this tutorial, you have learned how to use the Oracle REGEXP_SUBSTR() to extract a portion of string based on a regular expression.

Was this tutorial helpful?