Oracle REGEXP_SUBSTR() Function

Oracle REGEXP_SUBSTR() is a built-in function that searches a given source string and returns a string that matches the given regular expression.

You can think of this function as an enhanced version of SUBSTR() with regular expressions.

Syntax of Oracle REGEXP_SUBSTR()

Here is the syntax for the Oracle REGEXP_SUBSTR() function:

REGEXP_SUBSTR(source_char, pattern)
REGEXP_SUBSTR(source_char, pattern, position)
REGEXP_SUBSTR(source_char, pattern, position, occurrence)
REGEXP_SUBSTR(source_char, pattern, position, occurrence, match_param)
REGEXP_SUBSTR(source_char, pattern, position, occurrence, match_param, subexpr)

Parameters

source_char

Required. The string to search. It can be any of the following data types: CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.

pattern

Required. The regular expression. It can be any of the following data types: CHAR, VARCHAR2, NCHAR, or NVARCHAR2. It can contain up to 512 bytes.

position

Optional. An integer that indicates the starting position for the search. The default is 1.

occurrence

Optional. An integer that indicates which occurrence to return. The default is 1.

match_param

Optional. The pattern to use for matching. It can be VARCHAR2 or CHAR. You can use this parameter to change the default search behavior. You can use one or more of the following characters:

  • 'i' specifies a case-insensitive match, even if the sort order of the collation determines case sensitivity.
  • 'c' specifies a case-sensitive and accent-sensitive match, even if the sort order of the collation is case-insensitive or accent-insensitive.
  • 'n' allows the dot (.) (which matches any character) to match newline characters. If this parameter is omitted, the dot does not match newline characters.
  • 'm' treats the source string as multiline. Oracle interprets the caret (^) and dollar sign ($) as the beginning and end, respectively, of any line in the source string, not just the beginning or end of the entire source string. If this parameter is omitted, Oracle treats the source string as single-line.
  • 'x' ignores whitespace. By default, whitespace matches itself.
subexpr

Optional. An integer from 0 to 9. 0 represents the entire regular expression, and 1-9 represents a group in the regular expression. The default is 0.

Return Value

Oracle REGEXP_SUBSTR() searches a given source string and returns a string that matches the given regular expression.

If there is no match in the source string for the given regular expression, REGEXP_SUBSTR() returns NULL.

If any argument is NULL, REGEXP_SUBSTR() returns NULL.

Examples of Oracle REGEXP_SUBSTR()

Here are some examples that demonstrate the usage of the Oracle REGEXP_SUBSTR() function.

Basic Usage

The following statement is used to find a continuous sequence of lowercase letters in the string '12AB34cd45ef'.

SELECT
    REGEXP_SUBSTR('12AB34cd45ef', '[a-z]+') Result
FROM dual;

Output:

RESULT
_________
cd

In this example, the regular expression pattern [a-z]+ represents a sequence of lowercase letters with a minimum length of 1.

If the source string contains no content that matches the given regular expression, REGEXP_SUBSTR() returns NULL.

SELECT
    REGEXP_SUBSTR('12AB34cd45ef', '[a-z]{3,}') Result
FROM dual;

Output:

RESULT
_________
NULL

In this example, the regular expression pattern [a-z]{3,} represents a sequence of at least 3 lowercase letters.

Starting position

Oracle REGEXP_SUBSTR() allows you to specify the starting position for the search.

The following statement finds a continuous sequence of lowercase letters in the string ‘12AB34cd45ef’, starting from the 9th character.

SELECT
    REGEXP_SUBSTR('12AB34cd45ef', '[a-z]+', 9) Result
FROM dual;

Output:

RESULT
_________
ef

Occurrence number

Oracle REGEXP_SUBSTR() allows you to specify the occurrence number to search for.

The following statement finds the second occurrence of a continuous sequence of lowercase letters in the string ‘12AB34cd45ef’.

SELECT
    REGEXP_SUBSTR('12AB34cd45ef', '[a-z]+', 1, 2) Result
FROM dual;

Output:

RESULT
_________
ef

Matching parameter

Oracle REGEXP_SUBSTR() allows you to specify matching options to change the search behavior. Oracle performs a case-sensitive search by default. If you want a case-insensitive search, use the i parameter.

The following statement finds a continuous sequence of letters in the string ‘12AB34cd45ef’, case-insensitive.

SELECT
    REGEXP_SUBSTR('12AB34cd45ef', '[a-z]+', 1, 1, 'i') Result
FROM dual;

Output:

RESULT
_________
AB

Subexpression

The following statement shows how subexpressions work in the Oracle REGEXP_SUBSTR() function.

SELECT
    REGEXP_SUBSTR('12AB34cd45ef', '([a-z])([a-z])', 1, 1, 'i', 0) Result0,
    REGEXP_SUBSTR('12AB34cd45ef', '([a-z])([a-z])', 1, 1, 'i', 1) Result1,
    REGEXP_SUBSTR('12AB34cd45ef', '([a-z])([a-z])', 1, 1, 'i', 2) Result2
FROM dual;

Output:

RESULT0    RESULT1    RESULT2
__________ __________ __________
AB         A          B

NULL parameter

If any parameter is NULL, REGEXP_SUBSTR() returns NULL.

SET NULL 'NULL';
SELECT
    REGEXP_SUBSTR(NULL, 'a') "Result1",
    REGEXP_SUBSTR('a', NULL) "Result2",
    REGEXP_SUBSTR('a', 'b', NULL) "Result3",
    REGEXP_SUBSTR('a', 'b', 1, NULL) "Result4",
    REGEXP_SUBSTR('a', 'b', 1, 1, NULL) "Result5",
    REGEXP_SUBSTR('a', 'b', 1, 1, 'i', NULL) "Result6"
FROM dual;

Output:

Result1    Result2    Result3    Result4    Result5    Result6
__________ __________ __________ __________ __________ __________
NULL       NULL       NULL       NULL       NULL       NULL

In this example, we use the statement SET NULL 'NULL'; to display NULL values as the string 'NULL'.

Conclusion

Oracle REGEXP_SUBSTR() is a built-in function that searches a given source string and returns a string that matches the given regular expression.