Thursday, September 24, 2009

REGEXP_REPLACE for HR Data Formatting

Regular expressions are good utilities to search & edit free form text data. According to Wikipedia, following is the definition of regular expression.

"In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. A regular expression (often shortened to regex or regexp) is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification"

This article specifically talks about usage of REGEXP_REPLACE "Regular Expression Replace" function for formatting some of the formatting needs of Oracle HR data. Below are some real life scenarios for which REGEXP_REPLACE can be put to good use.

Phone Number Formatting

In Oracle HR phone numbers fields are free form fields. That means HR user or emploee or manager can enter the data in any format they like, such as (123) 456 7890, 123-456-7890, 1234567890 ext:20 etc., When we want to report or send this data in an interface in a specific format, it really becomes challenging. Normally without regular expression, you can use traditional REPLACE function to tackle the problem as shown below:

select replace(replace(replace(phone_number,'-'),'('),'.') from per_phones;

This method works, if you have few characters to replace, but as the non-numeric characters increase this SQL could get ugly pretty fast.

Using REGEXP_REPLACE, you can solve the problem very easily. For formatting purpose all we want is extract all numeric values and ingnore all non-numeric values. Below is the simple SQL we can use to achieve this:


select REGEXP_REPLACE(phone_number,'[^[:digit:]]') from per_phones

Above SQL replaces all non-numeric characters with null and returns the value. Once you the number you can format it any way you want.

Address Line Formatting

Another common requirement i see on Oracle HR data is, address line formatting. Lot of time when we send the address data in interfaces, some vendors want us to takeout all punctuation and send the data. You can use REPLACE function to accomplish, but that gets complicated since you have to use REPLACE for each character. Below is the SQL using REGEXP_REPLACE to take out all punctuations from address lines.


select REGEXP_REPLACE(address_line1,'[[:punct:]]')
from per_addresses;

2 comments:

  1. Anonymous3:20 PM

    Hi there, I found your website by means of Google whilst searching for a similar matter, your site got here up,
    it looks good. I have bookmarked it in my google bookmarks.

    Hello there, just become alert to your weblog thru Google, and located
    that it's really informative. I'm gonna be careful for brussels.
    I will appreciate in the event you proceed this in future.
    Many folks can be benefited out of your writing.
    Cheers!

    Also visit my web-site: Weight Loss Videos (facebax.com)

    ReplyDelete
  2. Anonymous9:20 AM

    Zeschuk minecraft pocket edition cheats explained that -in the future-
    your TV would be more simplistic. It minecraft pocket edition cheats has received
    a boost in marketing from Dogs Against Romney. Payment for mobile devices can support these games with
    multiple memories and personalities. Today, we can transfer that game of badminton just by a
    star.

    Feel free to surf to my page ... Minecraft pocket edition hack level

    ReplyDelete