Monday, January 26, 2009

Escaping and special characters

The escape character in regexps is the backslash. To match a backslash you just need to escape it as well. Most functionality in regexps starts with a special character. If you like to match strings, that contain special characters you need to escape them with a leading backslash.

199\s\$

199 $ bargain.

As we have learned in anchoring the $ sign is used to anchor the regexp and the end of the string. If you need to match a $ sign you need to escape it with a backslash.

There are some other non printable characters that can be matched with an escape sequence. Please note that some of these work like a character class.

Let me introduce the most important escape sequences.

\s - white space ( not only 0x20 , but also TAB ENTER NEWLINE )
\n – new line ( well known for C programmers )
\t – tab
\d – digit – [0-9]
\w – word , alpha

The following example will make thing a little clearer.

\d\d\d\s\$\s\w*\.

199 $ bargain.

The 3 \d match any number with 3 digits.
The \s matches the whitespace
The \$ matches the $ because $ unescaped has a different meaning in regexps.
Then we have another whitespace \s
\w* Zero or more word characters.
\. A dot escaped because the dot unescaped is used as a wildcard in regexps.

The above regexp will also match a string like

999 $ watch.

Other special characters can be notated in hex code.
\x30 will match a character of hex 30

No comments:

Post a Comment