Tutorial 36 - Regular Expressions in PPL
Regular Expressions (Regex):
From www.regular-expressions.info:
“A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. The regex equivalent is .*\.txt . But you can do much more with regular expressions. In a text editor like EditPad Pro or a specialized text processing tool like PowerGREP, you could use the regular expression \b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]\b to search for an email address. Any email address, to be exact. A very similar regular expression (replace the first \b with ^ and the last one with $) can be used by a programmer to check if the user entered a properly formatted email address. In just one line of code, whether that code is written in Perl, PHP, Java, a .NET language or a multitude of other languages.”
PPL supports a variety of expressions like:
\Quote the next metacharacter
^ Match the beginning of the string
. Match any character
$ Match the end of the string
| Alternation
() Grouping (creates a capture)
[] Character class
==GREEDY CLOSURES==
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
Match at least n times
Match at least n but not more than m times
==ESCAPE CHARACTERS==
\t tab (HT, TAB)
\n newline (LF, NL)
\r return (CR)
\f form feed (FF)
==PREDEFINED CLASSES==
\l lowercase next char
\u uppercase next char
\a letters
\A non letters
\w alphanimeric [0-9a-zA-Z]
\W non alphanimeric
\s space
\S non space
\d digits
\D non nondigits
\x exadecimal digits
\X non exadecimal digits
\c control charactrs
\C non control charactrs
\p punctation
\P non punctation
To search a string using regular expression in PPL you will use the Search() function. You can also make sure that the string is an exact match of the regular expression you are providing with the Match() function.
Let's take the following example:
string$ = "Bill Clinton";
expr$ = "^(Bill|George|Renald) (Clinton|Bush|Reagan)$";
Search(expr$, string$, b$, e$);
ShowMessage(b$ + "," + e$);
The expr$ variable contains an expression that says, if the first word is either Bill, George or Renald and that the string ends with Clinton, Bush or Reagan, we have a match. “Bill Clinton” will be the beginning of our result string, b$ and “” will be our ending string e$.
i$ = 0;
while(i$ <= subexpcount - 1)
subexp(string$, i$, begin$, len$);
ShowMessage("SubExp " + i$ + " = " + begin$ + "," + len$);
i$++;
end;
In the previous example, we check each sub expression to see what matched in the string string$ and where it started and how many characters the sub expression took from string$.
Sub expression 0 will return “Bill Clinton” for a length of 12 because it do the whole expression. Sub expression 1 will return “Bill Clinton” but for 4 characters only, the first sub expression “^(Bill|George|Renald)” is analyzed. Sub expression 2 will return “Clinton” for 7 characters, the second sub expression “(Clinton|Bush|Reagan)$” is analyzed.
From www.regular-expressions.info:
“A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. The regex equivalent is .*\.txt . But you can do much more with regular expressions. In a text editor like EditPad Pro or a specialized text processing tool like PowerGREP, you could use the regular expression \b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]\b to search for an email address. Any email address, to be exact. A very similar regular expression (replace the first \b with ^ and the last one with $) can be used by a programmer to check if the user entered a properly formatted email address. In just one line of code, whether that code is written in Perl, PHP, Java, a .NET language or a multitude of other languages.”
PPL supports a variety of expressions like:
\Quote the next metacharacter
^ Match the beginning of the string
. Match any character
$ Match the end of the string
| Alternation
() Grouping (creates a capture)
[] Character class
==GREEDY CLOSURES==
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
Match at least n times
Match at least n but not more than m times
==ESCAPE CHARACTERS==
\t tab (HT, TAB)
\n newline (LF, NL)
\r return (CR)
\f form feed (FF)
==PREDEFINED CLASSES==
\l lowercase next char
\u uppercase next char
\a letters
\A non letters
\w alphanimeric [0-9a-zA-Z]
\W non alphanimeric
\s space
\S non space
\d digits
\D non nondigits
\x exadecimal digits
\X non exadecimal digits
\c control charactrs
\C non control charactrs
\p punctation
\P non punctation
To search a string using regular expression in PPL you will use the Search() function. You can also make sure that the string is an exact match of the regular expression you are providing with the Match() function.
Let's take the following example:
string$ = "Bill Clinton";
expr$ = "^(Bill|George|Renald) (Clinton|Bush|Reagan)$";
Search(expr$, string$, b$, e$);
ShowMessage(b$ + "," + e$);
The expr$ variable contains an expression that says, if the first word is either Bill, George or Renald and that the string ends with Clinton, Bush or Reagan, we have a match. “Bill Clinton” will be the beginning of our result string, b$ and “” will be our ending string e$.
i$ = 0;
while(i$ <= subexpcount - 1)
subexp(string$, i$, begin$, len$);
ShowMessage("SubExp " + i$ + " = " + begin$ + "," + len$);
i$++;
end;
In the previous example, we check each sub expression to see what matched in the string string$ and where it started and how many characters the sub expression took from string$.
Sub expression 0 will return “Bill Clinton” for a length of 12 because it do the whole expression. Sub expression 1 will return “Bill Clinton” but for 4 characters only, the first sub expression “^(Bill|George|Renald)” is analyzed. Sub expression 2 will return “Clinton” for 7 characters, the second sub expression “(Clinton|Bush|Reagan)$” is analyzed.
No Comments
Post a Comment