Regular expressions in Opportunity Detect
In Opportunity Detect, you use regular expressions in two situations.
- When you create a Real time file connector in the Server Groups page, you use a regular expression to match the pattern used in file names.
- When you create a Boolean expression in a component and you select the Like operator, you can use a regular expression to set the criteria for comparison.
Opportunity Detect uses the Streams standard toolkit for matching regular expressions. Opportunity Detect supports the POSIX extended regular expressions standard.
The regular expression must conform to the Streams Processing Language requirements, described here: https://www-01.ibm.com/support/knowledgecenter/SSCRJU_3.2.0/com.ibm.swg.im.infosphere.streams.spl-language-specification.doc/doc/primitivetypes.html
Take care that the pattern you specify exactly matches your intent. Some level of testing is always advisable to verify that your patterns are actually matching the required expressions. You can use a trial and error process to design patterns, starting with low complexity and changing them bit by bit to achieve the required result. Pay particular attention to escaping backslashes.
Special characters
Here is a summary of special character usage in POSIX regular expressions.
- Period (.) : Matches any character.
- Anchors (^, $) : The (^) anchor defines the start of the expression, and the ($) anchor defines the end of the expression.
- Asterisk (*) : A quantifier that matches a single character or group of characters any number of times.
- Plus (+) : A quantifier that matches a single character or a group, one or more times.
- Question mark (?) : A quantifier that represents optional items.
Bracket expressions
A bracket expression represents a class of characters, any one of which could be a match a single
character. For example [a-c]
is a bracket expression that will match any of the characters a, b, or
c. For example: the regex [a-c]+
will match aaa, abc, ca, etc; or any string containing a sequence of at
least one character from the set a, b, or c followed by any number of characters also from that set.
There are other forms of bracket expressions. For example, [a-c]
could be also specified as [abc]
. Within a
bracket expression, there are collating elements. It has the form [.col.]
. (There might be other
forms.) A collating element is a character or group of characters that act as a single character in
a bracket expression. For example, if [.ae.]
is a collating element, then it can be used within a
bracket expression [[.ae.]bc]
, which states: match any of the characters "ae", b, or c. In other
words, it forces ae to be treated as a single character.
POSIX | Description | ASCII |
---|---|---|
[[:alnum:]] |
Alphanumeric characters | [a-zA-Z0-9] |
[[:alpha:]] |
Alphabetic characters | [a-zA-Z] |
[[:blank:]] |
Space and tab | [ \t] |
[[:cntrl:]] |
Control characters | [\x00-\x1F\x7F] |
[[:digit:]] |
Digits | [0-9] |
[[:graph:]] |
Visible characters (that is, anything except spaces, control characters, etc.) | [\x21-\x7E] |
[[:lower:]] |
Lowercase characters | [a-z] |
[[:print:]] |
Visible characters and spaces (that is, anything except control characters, etc.) | [\x20-\x7E] |
[[:punct:]] |
Punctuation and symbols | [!"#$%"()*+,-./:;<=>?@[\]^_`{}~] |
[[:space:]] |
All whitespace characters, including line breaks | [ \t\r\n\v\f] |
[[:upper:]] |
Uppercase letters | [A-Z] |
[[:xdigit:]] |
Hexadecimal digits | [A-Fa-f0-9] |
Quantification
The question mark makes the preceding token in the regular expression optional. For example, colou?r matches both colour and color.
The star (*) tells the engine to attempt to match the preceding token zero or more times. The plus sign (+) tells the engine to attempt to match the preceding token one or more times.
An additional quantifier allows you to specify how many times a token can be repeated. The syntax is {min,max}, where min is zero or a positive integer indicating the minimum number of matches, and max is an integer equal to or greater than min indicating the maximum number of matches. If the comma is present but max is omitted, the maximum number of matches is infinite.
{0,1} is the same as ?
{0,} is the same as *
{1,} is the same as +
You could use \b[1-9][0-9]{3}\b
to match a number between 1000 and 9999.
\b[1-9][0-9]{2,4}\b
matches a number between 100 and 99999. Notice the use of the
word boundaries.
Grouping
Single characters, or expressions matching single characters, enclosed in parentheses (round brackets), are treated as a regular expression matching a single character. That is, quantification and other rules apply to the group in the parentheses as a whole.
Alternation
Two regular expressions separated by the special character vertical-line ( '|' ) match a string that is matched by either.
For example, the regular expression "a((bc)|d)" matches the string "abc" and the string "ad".
Single characters, or expressions matching single characters, separated by the vertical bar and enclosed in parentheses, are treated as a regular expression matching a single character.
Example for file name matching
You might create the following regular expression to match timestamp suffixed file names used with the Real time file connector.
Detect\.a\.trans\.[0-9]{8,14}
This expression matches file names with the common prefix Detect.a.trans
and
ending with timestamp digits of length greater than 8 and less than 14. This is done because file
names can have 8 digits for the basic date (4 for year, 2 for month, 2 for date) and 6 extra digits
for more granular timestamps (hh:mm:ss).
Detect.a.trans.20100901
Detect.a.trans.20100908
Detect.a.trans.20100922
Detect.a.trans.20101001
Detect.a.trans.20101008
Detect.a.trans.20101022
Detect.a.trans.20101201
Detect.a.trans.20101208
Detect.a.trans.20101222
Detect.a.trans.20101222
Detect.a.trans.20101223040506
Detect.a.trans.20101223033240
Useful links for POSIX regular expressions
- OpenGroup POSIX regular expression specification:
http://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap09.html
- http://www.regular-expressions.info/posix.html
- Wikipedia regular expressions:
https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended
- Wikibooks POSIX regular expressions:
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions