Regular expressions are good utilities to search & edit free form text data. According to Wikipedia, following is the definition of regular expression.
"In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. A regular expression (often shortened to regex or regexp) is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification"
This article specifically talks about usage of REGEXP_REPLACE "Regular Expression Replace" function for formatting some of the formatting needs of Oracle HR data. Below are some real life scenarios for which REGEXP_REPLACE can be put to good use.
Phone Number Formatting
In Oracle HR phone numbers fields are free form fields. That means HR user or emploee or manager can enter the data in any format they like, such as (123) 456 7890, 123-456-7890, 1234567890 ext:20 etc., When we want to report or send this data in an interface in a specific format, it really becomes challenging. Normally without regular expression, you can use traditional REPLACE function to tackle the problem as shown below:
select replace(replace(replace(phone_number,'-'),'('),'.') from per_phones;
This method works, if you have few characters to replace, but as the non-numeric characters increase this SQL could get ugly pretty fast.
Using REGEXP_REPLACE, you can solve the problem very easily. For formatting purpose all we want is extract all numeric values and ingnore all non-numeric values. Below is the simple SQL we can use to achieve this:
select REGEXP_REPLACE(phone_number,'[^[:digit:]]') from per_phones
Above SQL replaces all non-numeric characters with null and returns the value. Once you the number you can format it any way you want.
Address Line Formatting
Another common requirement i see on Oracle HR data is, address line formatting. Lot of time when we send the address data in interfaces, some vendors want us to takeout all punctuation and send the data. You can use REPLACE function to accomplish, but that gets complicated since you have to use REPLACE for each character. Below is the SQL using REGEXP_REPLACE to take out all punctuations from address lines.