How the REGEXP_SUBSTR() function works in Mariadb?

The REGEXP_SUBSTR() function is a string function that returns the substring that matches a given regular expression pattern in a subject string.

Posted on

The REGEXP_SUBSTR() function is a string function that returns the substring that matches a given regular expression pattern in a subject string. It is useful for extracting specific parts of a string based on a certain pattern. In this article, we will introduce the syntax and usage of the REGEXP_SUBSTR() function in Mariadb, and provide some examples to demonstrate its functionality. We will also list some related functions that can be used in conjunction with the REGEXP_SUBSTR() function.

Syntax

The syntax of the REGEXP_SUBSTR() function is as follows:

REGEXP_SUBSTR(subject, pattern)

The function takes two arguments:

  • subject: The string to be searched.
  • pattern: The regular expression pattern to be matched.

The function returns the part of the subject string that matches the pattern, or an empty string if no match is found. The function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data. The collation case sensitivity can be overwritten using the (?i) and (?-i) PCRE flags¹.

Examples

In this section, we will show some examples of using the REGEXP_SUBSTR() function in Mariadb. We will use the following sample table for illustration:

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  email VARCHAR(50),
  phone VARCHAR(20)
);

INSERT INTO employees VALUES
(1, 'Alice', '[email protected]', '+1-234-567-8901'),
(2, 'Bob', '[email protected]', '+1-345-678-9012'),
(3, 'Charlie', '[email protected]', '+1-456-789-0123'),
(4, 'David', '[email protected]', '+44-123-456-7890'),
(5, 'Eve', '[email protected]', '+86-123-4567-8901');

Example 1: Extracting the domain name from an email address

We can use the REGEXP_SUBSTR() function to extract the domain name from an email address. The domain name is the part of the email address after the @ symbol. We can use the following regular expression pattern to match the domain name:

'@(.*)'

This pattern means to match a literal @ character, followed by any number of any characters, and capture them in a group. The function will return the captured group as the result. We can use the following query to apply the function to the email column of the employees table:

SELECT name, email, REGEXP_SUBSTR(email, '@(.*)') AS domain
FROM employees;

The output is:

+---------+---------------------+--------------+
| name    | email               | domain       |
+---------+---------------------+--------------+
| Alice   | [email protected]   | @example.com |
| Bob     | [email protected]     | @example.com |
| Charlie | [email protected] | @example.com |
| David   | [email protected]   | @example.com |
| Eve     | [email protected]     | @example.com |
+---------+---------------------+--------------+

Example 2: Extracting the country code from a phone number

We can also use the REGEXP_SUBSTR() function to extract the country code from a phone number. The country code is the part of the phone number before the first - character. We can use the following regular expression pattern to match the country code:

'^(.*)-'

This pattern means to match any number of any characters at the beginning of the string, and capture them in a group, followed by a literal - character. The function will return the captured group as the result. We can use the following query to apply the function to the phone column of the employees table:

SELECT name, phone, REGEXP_SUBSTR(phone, '^(.*)-') AS country_code
FROM employees;

The output is:

+---------+-------------------+---------------+
| name    | phone             | country_code  |
+---------+-------------------+---------------+
| Alice   | +1-234-567-8901   | +1-234-567-   |
| Bob     | +1-345-678-9012   | +1-345-678-   |
| Charlie | +1-456-789-0123   | +1-456-789-   |
| David   | +44-123-456-7890  | +44-123-456-  |
| Eve     | +86-123-4567-8901 | +86-123-4567- |
+---------+-------------------+---------------+

Example 3: Extracting the first name from a full name

We can also use the REGEXP_SUBSTR() function to extract the first name from a full name. The first name is the part of the full name before the first space character. We can use the following regular expression pattern to match the first name:

'^([^ ]*)'

This pattern means to match any number of any characters except space at the beginning of the string, and capture them in a group. The function will return the captured group as the result. We can use the following query to apply the function to the name column of the employees table:

SELECT name, REGEXP_SUBSTR(name, '^([^ ]*)') AS first_name
FROM employees;

The output is:

+---------+------------+
| name    | first_name |
+---------+------------+
| Alice   | Alice      |
| Bob     | Bob        |
| Charlie | Charlie    |
| David   | David      |
| Eve     | Eve        |
+---------+------------+

Example 4: Extracting the last four digits of a phone number

We can also use the REGEXP_SUBSTR() function to extract the last four digits of a phone number. The last four digits are the part of the phone number after the last - character. We can use the following regular expression pattern to match the last four digits:

'-(.*)$'

This pattern means to match a literal - character, followed by any number of any characters at the end of the string, and capture them in a group. The function will return the captured group as the result. We can use the following query to apply the function to the phone column of the employees table:

SELECT name, phone, REGEXP_SUBSTR(phone, '(.{4})$') AS last_four
FROM employees;

The output is:

+---------+-------------------+-----------+
| name    | phone             | last_four |
+---------+-------------------+-----------+
| Alice   | +1-234-567-8901   | 8901      |
| Bob     | +1-345-678-9012   | 9012      |
| Charlie | +1-456-789-0123   | 0123      |
| David   | +44-123-456-7890  | 7890      |
| Eve     | +86-123-4567-8901 | 8901      |
+---------+-------------------+-----------+

Example 5: Extracting the user name from an email address

We can also use the REGEXP_SUBSTR() function to extract the user name from an email address. The user name is the part of the email address before the @ symbol. We can use the following regular expression pattern to match the user name:

'^(.*)@'

This pattern means to match any number of any characters at the beginning of the string, and capture them in a group, followed by a literal @ character. The function will return the captured group as the result. We can use the following query to apply the function to the email column of the employees table:

SELECT name, email, REGEXP_SUBSTR(email, '^([^@]*)') AS user_name
FROM employees;

The output is:

+---------+---------------------+-----------+
| name    | email               | user_name |
+---------+---------------------+-----------+
| Alice   | [email protected]   | alice     |
| Bob     | [email protected]     | bob       |
| Charlie | [email protected] | charlie   |
| David   | [email protected]   | david     |
| Eve     | [email protected]     | eve       |
+---------+---------------------+-----------+

There are some other functions that are related to the REGEXP_SUBSTR() function in Mariadb. They are:

  • REGEXP_INSTR(): This function returns the position of the first occurrence of the pattern in the subject string, or 0 if no match is found. It also accepts optional arguments to specify the start position, occurrence, return option, and match parameter.
  • REGEXP_REPLACE(): This function returns a new string where the pattern is replaced by the replacement in the subject string. It also accepts optional arguments to specify the position, occurrence, and match parameter.

Conclusion

In this article, we have learned how to use the REGEXP_SUBSTR() function in Mariadb to extract substrings that match a regular expression pattern from a subject string. We have also seen some examples of applying the function to different scenarios, and some related functions that can be used in conjunction with the REGEXP_SUBSTR() function. We hope this article has helped you understand the functionality and usage of the REGEXP_SUBSTR() function in Mariadb.