Thursday, June 13, 2013

Salesforce : APEX regular expressions






Regular expressions (REGX), is a very powerful utility in String data processing ( generally provided by all programming languages). Using regular expression, you can perform complex string operations in simple and short way ( may be one or two lines of code, for which if you write your own code, may lead to 100-200 lines of code).  Basic regular expression, you may have used with your dos or Unix file search, *.txt, *.exe files.

APEX programming also supports for REGX, let’s take a scenario below.

Scenario : Space Scientists have received a message from aliens. They tried hard to decrypt this message, finally successful derived algorithm for that
Message => '7^T:9hi2s [i}s 4b=?3e:au3tif5u8l 3w*/*9o1r((((l0d'

Algo: Remove all non alphabetic characters (excluding ‘ ‘(spaces)) from message.

SFDC apex programmer came up with following one line code and used String (replaceAll method with regular expression)

// execute following code to see message'
String message = => '7^T:9hi2s [i}s 4b=?3e:au3ti5485f5u8l 3w*/*9o1r((((l0d';

String decryptedMessage  = message.replaceAll('[^A-Za-z .]','');

System.debug(decryptedMessage);

//execute this code in developer console or in eclipse ( execute anonymous).

Now, see how simple it is in single line using string method replaceAll with regx. Now you try another string method replaceFirst, split.

Learn about regular expression here : http://www.cs.tut.fi/~jkorpela/perl/regexp.html   ( this is Perl document but syntax of writing regular expression is same (standard))

Now learn about Pattern and Matcher classes,


Pattern and Matcher Classes
A regular expression is a string that is used to match another string, using a specific syntax. Apex supports the use of regular
expressions through its Pattern and Matcher classes.

Note: Salesforce limits the number of times an input sequence for a regular expression can be accessed to 1,000,000
times. If you reach that limit, you receive a runtime error.


Let’s take one simple example:

Pattern MyPattern = Pattern.compile('a{5}b'); //complied // any string which starts with aaaaa and followed and ends with b, after this example try ‘a*b’

Matcher MyMatcher = MyPattern.matcher('aaaaab');

System.assert(MyMatcher.matches()); 

Pattern MyPattern = Pattern.compile('a{4}[0-9]{2}b'); // any string which starts with aaaaa and then 2 digits and then ends with b

Matcher MyMatcher = MyPattern.matcher('aaaa11b');  //after this example try ‘'aaaa111b'’

System.assert(MyMatcher.matches());


Other way is:
Boolean isOk = Pattern.matches('a*b', 'aaaaab');

Sample REGX : 

Boolean isOk = Pattern.matches('[a-z]+', 'aaaab'); //String have only  lower case alphabets 
Boolean isOk = Pattern.matches('[A-Z]+', 'AAW'); //String have only  upper case alphabets 
Boolean isOk = Pattern.matches('[0-9]+', '4773'); //String have only  number characters
Boolean isOk = Pattern.matches('[a-zA-Z0-9]+', '4773'); //String have only  alpha numeric characters
Boolean isOk = Pattern.matches('[0-9]+@[a-z]+', '477@a');  //String start with number string and have one @ as character and then followed by lower case alphabets string.

Note if you use the previous complied pattern then you can reuse that multiple times ( adds performance ).


You can also define myMatcher region ( region of the input string  where you want to perform match by default it entire string)

//MyMatcher.region(start, MyMatcher.regionEnd());



Pattern MyPattern = Pattern.compile('a{4}[0-9]{2}b');

Matcher MyMatcher = MyPattern.matcher('aaaa11b');

MyMatcher.region(0, MyMatcher.regionEnd());

System.assert(MyMatcher.matches());

No comments:

Post a Comment