Rostyslav Rava - Personal Website

How to Remove Leading, Trailing, and Extra Spaces in MySQL

During an in-depth review of large processed datasets, it is often possible to find that they contain multiple extra space characters. There may be thousands of them in data stored in a single text field. Multiplied by millions of rows, they create additional load on the system without adding any value to the processed data.

In general, there are three cases where blank spaces may appear. The most common is at the beginning of the string. These can usually be removed with the LTRIM function.

SELECT CONCAT('[', LTRIM('    John Smith'), ']') AS name;
+--------------+
| name         |
+--------------+
| [John Smith] |
+--------------+

And at the end of the string. Those can usually be removed with the RTRIM function.

SELECT CONCAT('[', RTRIM('John Smith    '), ']') AS name;
+--------------+
| name         |
+--------------+
| [John Smith] |
+--------------+

Those two simple cases can be processed together with the TRIM function.

SELECT CONCAT('[', TRIM('    John Smith    '), ']') AS name;
+--------------+
| name         |
+--------------+
| [John Smith] |
+--------------+

A more complicated case is when extra spaces appear in the middle of the text. To fix them, a simple REPLACE function can be used.

SELECT CONCAT('[', REPLACE('John  Smith', '  ', ' '), ']') AS name;

Even three consecutive spaces can be fixed.

SELECT CONCAT('[', REPLACE('John   Smith', '   ', ' '), ']') AS name;
+--------------+
| name         |
+--------------+
| [John Smith] |
+--------------+

But in reality, there may be dozens, hundreds, or even thousands of spaces. One approach is to use REPEAT to match a specific number of spaces.

SELECT CONCAT('[', REPLACE('John     Smith', REPEAT(' ', 5), ' '), ']') AS name;

Alternatively, execute the replacement of two spaces with one in a loop up to 65,534 times to clean up a single text field.

But realistically, with more than two spaces, such approaches are not an option. Here, replacement based on a regular expression is much more suitable. Runs of two or more spaces can be matched with the following regular expression: ' {2,}'.

SELECT CONCAT('[', REGEXP_REPLACE('John     Smith', ' {2,}', ' '), ']') AS name;
+--------------+
| name         |
+--------------+
| [John Smith] |
+--------------+

Finally, the combination of REGEXP_REPLACE and the TRIM function can perform the needed cleanup: removing leading and trailing spaces from text and extra spaces within it.

SELECT CONCAT('[', REGEXP_REPLACE(TRIM('    John    Smith    '), ' {2,}', ' '), ']') AS name;
+--------------+
| name         |
+--------------+
| [John Smith] |
+--------------+

To verify that there are no leading, trailing, or extra spaces left in the text, the following regular expression can be used: '^ | {2,}| $'. It returns 1 when the string contains such spaces, and 0 when it does not.

SELECT 'John Smith ' REGEXP '^ | {2,}| $';   -- 1
SELECT ' John Smith' REGEXP '^ | {2,}| $';   -- 1
SELECT 'John  Smith' REGEXP '^ | {2,}| $';   -- 1
SELECT ' John  Smith ' REGEXP '^ | {2,}| $'; -- 1
SELECT ' John Smith ' REGEXP '^ | {2,}| $';  -- 1
SELECT 'John Smith' REGEXP '^ | {2,}| $';    -- 0
Tags: