How the REGEXP_INSTR() function works in Mariadb?

The REGEXP_INSTR() function is a string function in Mariadb that returns the position of the first occurrence of a regular expression pattern in a string.

Posted on

The REGEXP_INSTR() function is a string function in Mariadb that returns the position of the first occurrence of a regular expression pattern in a string. The function can be used to perform various searches and validations involving regular expressions, such as finding the location of a phone number, an email address, or a date format in a string.

Syntax

The syntax of the REGEXP_INSTR() function is as follows:

REGEXP_INSTR(string, pattern, [position, [occurrence, [return_option, [match_type]]]])

The function takes six arguments, of which four are optional:

  • string: A string expression that represents the string to be searched. The string expression can be any valid value, such as a literal, a column, a function, or a variable. The string expression cannot be NULL.
  • pattern: A string expression that represents the regular expression pattern to be matched. The string expression can be any valid value, such as a literal, a column, a function, or a variable. The string expression cannot be NULL. The regular expression syntax follows the POSIX Extended Regular Expression (ERE) standard, which is supported by Mariadb. For more information on the regular expression syntax, see here.
  • position: An integer expression that represents the starting position of the search in the string. The position is a positive number that indicates the number of characters from the beginning of the string. The default value is 1, which means that the search starts from the first character of the string. The position cannot be NULL or zero.
  • occurrence: An integer expression that represents the occurrence of the pattern to be returned. The occurrence is a positive number that indicates the number of matches to skip before returning the position. The default value is 1, which means that the position of the first occurrence of the pattern is returned. The occurrence cannot be NULL or zero.
  • return_option: An integer expression that represents the return option for the function. The return option can have one of the following values:
    • 0: The function returns the position of the first character of the occurrence of the pattern in the string. This is the default value.
    • 1: The function returns the position of the first character following the occurrence of the pattern in the string.
  • match_type: A string expression that represents the match type for the function. The match type can have one of the following values:
    • ‘c’: The function performs a case-sensitive match. This is the default value.
    • ‘i’: The function performs a case-insensitive match.

The function returns an integer value that represents the position of the occurrence of the pattern in the string, as follows:

  • If the pattern is found in the string, the function returns the position according to the return option and the match type. The position is a positive number that indicates the number of characters from the beginning of the string. For example, if the string is ‘Hello World’, the pattern is ‘o’, the position is 1, the occurrence is 1, the return option is 0, and the match type is ‘c’, the function returns 5, as the first occurrence of ‘o’ is at the fifth character of the string.
  • If the pattern is not found in the string, the function returns 0. For example, if the string is ‘Hello World’, the pattern is ‘x’, the position is 1, the occurrence is 1, the return option is 0, and the match type is ‘c’, the function returns 0, as ‘x’ is not found in the string.
  • If the string or the pattern is NULL, the function returns NULL. For example, if the string is NULL, the pattern is ‘o’, the position is 1, the occurrence is 1, the return option is 0, and the match type is ‘c’, the function returns NULL, as the string is NULL.
  • If the position or the occurrence is invalid, the function returns NULL. For example, if the string is ‘Hello World’, the pattern is ‘o’, the position is 0, the occurrence is 1, the return option is 0, and the match type is ‘c’, the function returns NULL, as the position is invalid.

Examples

Example 1: Finding the position of a phone number in a string

The following example finds the position of a phone number in the format of (xxx) xxx-xxxx in a string using the REGEXP_INSTR() function. The table contacts contains the contact name and the contact information for each contact. The query uses the REGEXP_INSTR() function to match the phone number pattern in the contact information column. The query returns the contact name, the contact information, and the position of the phone number for each contact.

SELECT contact_name, contact_info,
REGEXP_INSTR(contact_info, '\([0-9]{3}\) [0-9]{3}-[0-9]{4}') AS phone_number_position
FROM contacts;

The output is:

+--------------+---------------------------------+-----------------------+
| contact_name | contact_info                    | phone_number_position |
+--------------+---------------------------------+-----------------------+
| Alice        | [email protected] (123) 456-7890  |                    15 |
| Bob          | [email protected]                   |                     0 |
| Charlie      | (987) 654-3210 [email protected]  |                     1 |
| David        | [email protected] 111-2222      |                     0 |
| Eve          | [email protected] (555) 555-5555  |                    15 |
+--------------+---------------------------------+-----------------------+

The output shows that the REGEXP_INSTR() function returns the position of the phone number in the format of (xxx) xxx-xxxx in the contact information column. The position is a positive number that indicates the number of characters from the beginning of the string. For example, Alice’s phone number is at the 15th character of her contact information, so the function returns 15. Bob and David do not have a phone number in the specified format, so the function returns 0. Charlie’s phone number is at the first character of his contact information, so the function returns 1. Eve’s phone number is at the 15th character of her contact information, so the function returns 15.

Example 2: Finding the position of an email address in a string with a case-insensitive match

The following example finds the position of an email address in a string using the REGEXP_INSTR() function with a case-insensitive match. The table emails contains the email subject and the email body for each email. The query uses the REGEXP_INSTR() function to match the email address pattern in the email body column. The query also uses the match_type argument to specify a case-insensitive match by passing the value ‘i’. The query returns the email subject, the email body, and the position of the email address for each email.

SELECT email_subject, email_body,
REGEXP_INSTR(email_body, '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}', 1, 1, 0, 'i') AS email_address_position
FROM emails;

The output is:

+----------------+-------------------------------------------+------------------------+
| email_subject  | email_body                                | email_address_position |
+----------------+-------------------------------------------+------------------------+
| Hello          | Hi, this is Alice. My email is alice@...  |                     18 |
| Re: Hello      | Hello Alice, this is Bob. You can rea...  |                     20 |
| Greetings      | Dear Charlie, I am David from Hotmail...  |                     14 |
| Re: Greetings  | Hi David, thank you for your email. I ... |                     19 |
| Newsletter     | Welcome to the Outlook newsletter. To ... |                     40 |
+----------------+-------------------------------------------+------------------------+

The output shows that the REGEXP_INSTR() function returns the position of the email address in the email body column with a case-insensitive match. The position is a positive number that indicates the number of characters from the beginning of the string. For example, Alice’s email address is at the 18th character of her email body, so the function returns 18. Bob’s email address is at the 20th character of his email body, so the function returns 20. Charlie’s email address is at the 19th character of his email body, so the function returns 19. David’s email address is at the 14th character of his email body, so the function returns 14. The newsletter email address is at the 40th character of the email body, so the function returns 40.

Example 3: Finding the position of a date format in a string with a return option

The following example finds the position of a date format in the format of yyyy-mm-dd in a string using the REGEXP_INSTR() function with a return option. The table dates contains the date name and the date value for each date. The query uses the REGEXP_INSTR() function to match the date format pattern in the date value column. The query also uses the return_option argument to specify that the function returns the position of the first character following the occurrence of the pattern by passing the value 1. The query returns the date name, the date value, and the position of the date format for each date.

SELECT date_name, date_value,
REGEXP_INSTR(date_value, '[0-9]{4}-[0-9]{2}-[0-9]{2}', 1, 1, 1) AS date_format_position
FROM dates;

The output is:

+-------------+-------------------------+----------------------+
| date_name   | date_value              | date_format_position |
+-------------+-------------------------+----------------------+
| Today       | 2021-10-15              |                   11 |
| Tomorrow    | 2021-10-16 (Saturday)   |                   11 |
| Yesterday   | (Thursday) 2021-10-14   |                   20 |
| Birthday    | Happy birthday 1999-... |                   18 |
| Anniversary | 10 years since 2011-... |                   17 |
+-------------+-------------------------+----------------------+

The output shows that the REGEXP_INSTR() function returns the position of the first character following the occurrence of the date format in the date value column. The position is a positive number that indicates the number of characters from the beginning of the string. For example, today’s date is at the first character of the date value, so the function returns 11, as the date format has 10 characters. Tomorrow’s date is also at the first character of the date value, so the function returns 11. Yesterday’s date is at the 11th character of the date value, so the function returns 20, as the date format has 10 characters. Birthday’s date is at the 14th character of the date value, so the function returns 18, as the date format has 5 characters. Anniversary’s date is at the 13th character of the date value, so the function returns 17, as the date format has 5 characters.

There are some other functions that are related to the REGEXP_INSTR() function, such as:

  • REGEXP_REPLACE(): This function returns a string that is the result of replacing all occurrences of a regular expression pattern in a string with another string. The syntax of the function is REGEXP_REPLACE(string, pattern, replacement, [position, [occurrence, [match_type]]]), where string, pattern, position, occurrence, and match_type are the same as in the REGEXP_INSTR() function, and replacement is a string expression that represents the replacement string. The function returns a string that is the result of the replacement.
  • REGEXP_SUBSTR(): This function returns a substring of a string that matches a regular expression pattern. The syntax of the function is REGEXP_SUBSTR(string, pattern, [position, [occurrence, [match_type]]]), where string, pattern, position, occurrence, and match_type are the same as in the REGEXP_INSTR() function. The function returns a string that is the substring of the original string.

Conclusion

The REGEXP_INSTR() function is a useful function to return the position of the first occurrence of a regular expression pattern in a string. The function can be used to perform various searches and validations involving regular expressions, such as finding the location of a phone number, an email address, or a date format in a string. The function takes six arguments, of which four are optional:

  • string: A string expression that represents the string to be searched.
  • pattern: A string expression that represents the regular expression pattern to be matched.
  • position: An integer expression that represents the starting position of the search in the string.
  • occurrence: An integer expression that represents the occurrence of the pattern to be returned.
  • return_option: An integer expression that represents the return option for the function.