How the REGEXP_REPLACE() function works in Mariadb?

The REGEXP_REPLACE() function is a string function that returns a new string where the occurrences of a regular expression pattern are replaced by a specified replacement string in a subject string.

Posted on

The REGEXP_REPLACE() function is a string function that returns a new string where the occurrences of a regular expression pattern are replaced by a specified replacement string in a subject string. It is useful for modifying or transforming strings based on a certain pattern. In this article, we will introduce the syntax and usage of the REGEXP_REPLACE() function in Mariadb, and provide some examples to demonstrate its functionality. We will also list some related functions that can be used in conjunction with the REGEXP_REPLACE() function.

Syntax

The syntax of the REGEXP_REPLACE() function is as follows:

REGEXP_REPLACE(subject, pattern, replacement[, position[, occurrence[, match_parameter]]])

The function takes six arguments, of which the first three are required and the last three are optional:

  • subject: The string to be searched and replaced.
  • pattern: The regular expression pattern to be matched and replaced.
  • replacement: The string to replace the matched pattern.
  • position: The position in the subject string where the search starts. The default is 1, which means the beginning of the string.
  • occurrence: The number of occurrences of the pattern to be replaced. The default is 0, which means all occurrences. A positive value means the number of occurrences from the left, and a negative value means the number of occurrences from the right.
  • match_parameter: The modifier that affects the matching behavior. The default is an empty string, which means the default matching rules. The possible values are:
    • 'c': Case sensitive matching.
    • 'i': Case insensitive matching.
    • 'm': Multi-line mode, where ^ and $ match the beginning and end of each line, not the whole string.
    • 'n': Allows the period (.) to match the newline character.
    • 'u': Unicode mode, where the pattern is treated as a sequence of Unicode code points.

The function returns the modified string after replacing the pattern with the replacement in the subject string, according to the optional arguments. The function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data. The collation case sensitivity can be overwritten using the match_parameter argument or the (?i) and (?-i) PCRE flags.

Examples

In this section, we will show some examples of using the REGEXP_REPLACE() function in Mariadb. We will use the following sample table for illustration:

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  email VARCHAR(50),
  phone VARCHAR(20)
);

INSERT INTO employees VALUES
(1, 'Alice', '[email protected]', '+1-234-567-8901'),
(2, 'Bob', '[email protected]', '+1-345-678-9012'),
(3, 'Charlie', '[email protected]', '+1-456-789-0123'),
(4, 'David', '[email protected]', '+44-123-456-7890'),
(5, 'Eve', '[email protected]', '+86-123-4567-8901');

Example 1: Replacing the domain name in an email address

We can use the REGEXP_REPLACE() function to replace the domain name in an email address with a different one. The domain name is the part of the email address after the @ symbol. We can use the following regular expression pattern to match the domain name:

'@.*'

This pattern means to match a literal @ character, followed by any number of any characters. We can use the following query to apply the function to the email column of the employees table, and replace the domain name with newdomain.com:

SELECT name, email, REGEXP_REPLACE(email, '@.*', '@newdomain.com') AS new_email
FROM employees;

The output is:

+---------+---------------------+-----------------------+
| name    | email               | new_email             |
+---------+---------------------+-----------------------+
| Alice   | [email protected]   | [email protected]   |
| Bob     | [email protected]     | [email protected]     |
| Charlie | [email protected] | [email protected] |
| David   | [email protected]   | [email protected]   |
| Eve     | [email protected]     | [email protected]     |
+---------+---------------------+-----------------------+

Example 2: Replacing the country code in a phone number

We can also use the REGEXP_REPLACE() function to replace the country code in a phone number with a different one. The country code is the part of the phone number before the first - character. We can use the following regular expression pattern to match the country code:

'^.*-'

This pattern means to match any number of any characters at the beginning of the string, followed by a literal - character. We can use the following query to apply the function to the phone column of the employees table, and replace the country code with +99:

SELECT name, phone, REGEXP_REPLACE(phone, '^.*?-', '+99-') AS new_phone
FROM employees;

The output is:

+---------+-------------------+-------------------+
| name    | phone             | new_phone         |
+---------+-------------------+-------------------+
| Alice   | +1-234-567-8901   | +99-234-567-8901  |
| Bob     | +1-345-678-9012   | +99-345-678-9012  |
| Charlie | +1-456-789-0123   | +99-456-789-0123  |
| David   | +44-123-456-7890  | +99-123-456-7890  |
| Eve     | +86-123-4567-8901 | +99-123-4567-8901 |
+---------+-------------------+-------------------+

Example 3: Replacing the first name with the last name in a full name

We can also use the REGEXP_REPLACE() function to replace the first name with the last name in a full name. The first name is the part of the full name before the first space character, and the last name is the part of the full name after the last space character. We can use the following regular expression pattern to match the first name and the last name, and capture them in two groups:

'^([^ ]*) (.*)$'

This pattern means to match any number of any characters except space at the beginning of the string, and capture them in a group, followed by a space character, followed by any number of any characters at the end of the string, and capture them in another group. We can use the following query to apply the function to the name column of the employees table, and replace the first name with the last name, using the backreferences \2 and \1 to refer to the captured groups:

SELECT name, REGEXP_REPLACE(name, '^([^ ]*) (.*)$', '\2 \1') AS new_name
FROM employees;

The output is:

+---------+----------+
| name    | new_name |
+---------+----------+
| Alice   | Alice    |
| Bob     | Bob      |
| Charlie | Charlie  |
| David   | David    |
| Eve     | Eve      |
+---------+----------+

Example 4: Replacing the last four digits of a phone number with asterisks

We can also use the REGEXP_REPLACE() function to replace the last four digits of a phone number with asterisks, to mask the sensitive information. The last four digits are the part of the phone number after the last - character. We can use the following regular expression pattern to match the last four digits:

'([0-9]{4})$'

This pattern means to match a literal - character, followed by any number of any characters at the end of the string, and capture them in a group. We can use the following query to apply the function to the phone column of the employees table, and replace the last four digits with ****:

SELECT name, phone, REGEXP_REPLACE(phone, '([0-9]{4})$', '****') AS masked_phone
FROM employees;

The output is:

+---------+-------------------+-------------------+
| name    | phone             | masked_phone      |
+---------+-------------------+-------------------+
| Alice   | +1-234-567-8901   | +1-234-567-****   |
| Bob     | +1-345-678-9012   | +1-345-678-****   |
| Charlie | +1-456-789-0123   | +1-456-789-****   |
| David   | +44-123-456-7890  | +44-123-456-****  |
| Eve     | +86-123-4567-8901 | +86-123-4567-**** |
+---------+-------------------+-------------------+

Example 5: Replacing the user name with the first letter in an email address

We can also use the REGEXP_REPLACE() function to replace the user name with the first letter in an email address, to shorten the email address. The user name is the part of the email address before the @ symbol. We can use the following regular expression pattern to match the user name, and capture the first letter in a group:

'^(.)(.*)@'

This pattern means to match any character at the beginning of the string, and capture it in a group, followed by any number of any characters, followed by a literal @ character. We can use the following query to apply the function to the email column of the employees table, and replace the user name with the first letter, using the backreference \1 to refer to the captured group:

SELECT name, email, REGEXP_REPLACE(email, '^(.)(.+)@', '\\1@') AS short_email
FROM employees;

The output is:

+---------+---------------------+---------------+
| name    | email               | short_email   |
+---------+---------------------+---------------+
| Alice   | [email protected]   | [email protected] |
| Bob     | [email protected]     | [email protected] |
| Charlie | [email protected] | [email protected] |
| David   | [email protected]   | [email protected] |
| Eve     | [email protected]     | [email protected] |
+---------+---------------------+---------------+

There are some other functions that are related to the REGEXP_REPLACE() function in Mariadb. They are:

  • REGEXP_INSTR(): This function returns the position of the first occurrence of the pattern in the subject string, or 0 if no match is found. It also accepts optional arguments to specify the start position, occurrence, return option, and match parameter.
  • REGEXP_SUBSTR(): This function returns the substring that matches the pattern in the subject string, or an empty string if no match is found. It also accepts optional arguments to specify the position, occurrence, and match parameter.

Conclusion

In this article, we have learned how to use the REGEXP_REPLACE() function in Mariadb to replace occurrences of a regular expression pattern with a specified replacement string in a subject string. We have also seen some examples of applying the function to different scenarios, and some related functions that can be used in conjunction with the REGEXP_REPLACE() function. We hope this article has helped you understand the functionality and usage of the REGEXP_REPLACE() function in Mariadb.