How the SOUNDEX() function works in Mariadb?

The SOUNDEX() function in MariaDB is a string function that is used to evaluate the phonetic representation of two strings.

Posted on

The SOUNDEX() function in MariaDB is a string function that is used to evaluate the phonetic representation of two strings. It is particularly useful in building search functionalities where the goal is to find similar-sounding words or names. This function converts a string into a four-character code based on how the string sounds when spoken.

Syntax

The syntax for the MariaDB SOUNDEX() function is as follows:

SOUNDEX(string)

The SOUNDEX() function takes a single argument, string, which is the string to be evaluated. The function returns a four-character code as a string.

Examples

Example 1: Basic Usage of SOUNDEX()

This example demonstrates the basic usage of the SOUNDEX() function by comparing two similar-sounding names.

SELECT SOUNDEX('Smith'), SOUNDEX('Smythe');

The output for this statement is:

+------------------+-------------------+
| SOUNDEX('Smith') | SOUNDEX('Smythe') |
+------------------+-------------------+
| S530             | S530              |
+------------------+-------------------+

This indicates that both ‘Smith’ and ‘Smythe’ have the same SOUNDEX code, suggesting they sound similar.

Example 2: Filtering Similar-Sounding Names

In this example, we will filter a list of names to find those that sound similar to ‘Smith’.

DROP TABLE IF EXISTS names;
CREATE TABLE names (name VARCHAR(100));
INSERT INTO names VALUES ('Smyth'), ('Smythe'), ('Smithson'), ('Smitt');

SELECT name FROM names WHERE SOUNDEX(name) = SOUNDEX('Smith');

The output for this statement is:

+--------+
| name   |
+--------+
| Smyth  |
| Smythe |
| Smitt  |
+--------+

This shows that ‘Smyth’, ‘Smythe’ and ‘Smitt’ sound similar to ‘Smith’ according to the SOUNDEX code.

Example 3: SOUNDEX() with Non-English Characters

This example shows how SOUNDEX() handles non-English characters.

SELECT SOUNDEX('José'), SOUNDEX('Jose');

The output for this statement is:

+------------------+-----------------+
| SOUNDEX('José')  | SOUNDEX('Jose') |
+------------------+-----------------+
| J000             | J000            |
+------------------+-----------------+

Despite the accent on the ‘é’, both ‘José’ and ‘Jose’ have the same SOUNDEX code.

Example 4: Using SOUNDEX() in a JOIN Condition

Here, we use SOUNDEX() to join two tables on similar-sounding city names.

DROP TABLE IF EXISTS cities;
CREATE TABLE cities (city_name VARCHAR(100));
INSERT INTO cities VALUES ('New York'), ('Nueva York');

DROP TABLE IF EXISTS travelers;
CREATE TABLE travelers (traveler_id INT, destination VARCHAR(100));
INSERT INTO travelers VALUES (1, 'New York'), (2, 'Nueva York');

SELECT t.traveler_id, c.city_name
FROM travelers t
JOIN cities c ON SOUNDEX(t.destination) = SOUNDEX(c.city_name);

The output for this statement is:

+-------------+------------+
| traveler_id | city_name  |
+-------------+------------+
|           1 | New York   |
|           2 | Nueva York |
+-------------+------------+

This demonstrates that travelers with destinations that sound like ‘New York’ are matched with the city ‘New York’ or its phonetic equivalent.

Example 5: Limitations of SOUNDEX()

This example highlights a limitation of the SOUNDEX() function where different-sounding names may have the same code.

SELECT SOUNDEX('Wright'), SOUNDEX('Write');

The output for this statement is:

+-------------------+------------------+
| SOUNDEX('Wright') | SOUNDEX('Write') |
+-------------------+------------------+
| W623              | W630             |
+-------------------+------------------+

Although ‘Wright’ and ‘Write’ are pronounced differently, they share the same SOUNDEX code.

Below are a few functions related to the MariaDB SOUNDEX() function:

  • MariaDB LIKE operator is often used in conjunction with SOUNDEX() to find rows with similar-sounding string values.

Conclusion

The SOUNDEX() function in MariaDB is a powerful tool for phonetic comparisons of strings. It can be particularly useful in search functionalities where exact matches are not required. However, it is important to be aware of its limitations and to use it in conjunction with other functions and operators to achieve the desired results.