Monday, February 9, 2009

RegExp Options Explained

Now we will get to the various options you may give your regexp-engine. Please keep in mind that this depends on the engine you are working with. I will explain the options that apply to most implementations of regular expressions.

You should use The RegexCoach and play around with the examples while checking and unchecking the options.




(i) -> case insensitive matching

/(cat.*?)(cat)(.*?)$/i

The Cat and the Cat?

If we use case insensitive matching it will not matter if ,,cat’’ is capitalized or not.

(s) -> single line matching

/(cat.*?)(cat)(.*?)$/is

the cat and
the cat?

The engine will do a multi line match without the need to put a \n. In this case the newline is included in the wildcard (dot). In other words the string is regarded as a single line even if it contains more than one line.

(m) -> multi line matching

(cat)(.*?)[\?]$

the cat and dog?
the cat?
cat?

The multi line option changes the behavior of the anchors ^ and $ . The will work for each line and not start and end of the whole string.

(x) -> exclude unescaped whitespaces

(cat)\s(.*?) [\?]$

the cat and dog?

We have an escaped whitespace \s and an unescaped space _ after the second group (). If you turn (x) on, the engine will ignore that there needs to be a space before the ?.

(g) -> match globally

(cat)(.*?)

the cat and dog?
cats and dogs

If you wand to use the expression several times on a string, you need to make it global. This will continue the match on the string after the first match. This is very useful if you use the match in a while context.