Regular Expressions in Programming

Many IT professionals understand the value of knowing what regular expressions are, and how to use them effectively. They can drastically increase your expressive power when used on the command line interface, and from within executable scripts. For those who are not familiar with this topic area, or those who want a quick refresher, this is the article for you.
Regular expressions are used in textual pattern matching. They can be used to parse and match string inputs, such as particular characters or words, or to replace a string's contents based on matched patterns. Regular expressions are interpreted by a processor, and these are available for free on all major computing platforms. So how do you construct regular expressions? Let's see.
A regular expression is a pattern that matches a set of strings. The set of strings it can potentially match are built using only three concepts: boolean "or", grouping, and quantification. Boolean "or" is denoted by a vertical bar "|" and is used to separate a number of alternatives. For example, {gray|grey} can match "gray" or "grey". Grouping is denoted by standard parenthesis characters "(", ")" and is used to define precedence. Continuing from the example above, {gray|grey} is equivalent to {gr(a|e)y}, as the parenthesis are evaluated first. Finally, the last concept is quantification. This denotes how often what precedes it may occur. The typical characters used are: "?" indicating zero or one; "*" indicating zero or more; and "+" indicating one or more.
These concepts are best illustrated by example. I encourage you to read through them and attempt to explain these in plain English on your own before viewing the explanation.
{ab*c} means "a", followed by zero or more "b", followed by "c". Example inputs it matches are: "ac", "abc", "abbc", "abbbc", and so on.
{colou?r} means "colo", optionally followed by "u", followed by "r". The two inputs it matches are: "color" and "colour".
Many implementations, such as Perl and Visual Basic, include extra symbols to denote useful and frequently occurring patterns. These can be used in conjunction with the above operators, and are as follows:
[abc] Match any of a, b, and c. (Called a character class.)
[a-z] Match any character between a and z.
"." The dot matches any single character.
"\n" Matches a newline character.
"\t" Matches a tab.
"\d" Matches a digit [0-9].
"\D" Matches a non-digit.
"\w" Matches an alphanumberic character.
"\W" Matches a non-alphanumberic character.
"\s" Matches a whitespace character.
"\S" Matches a non-whitespace character.
"^" Match at the beginning of the input string.
"$" Match at the end of the input string.
Therefore, to check for a valid email address, one could write: {\w+@\w+}. This would check for things like "abc@abc.com", but also allow for "abc@abc". (Side exercise: How can you ensure it contains a "." and domain extension?)
The theoretical underpinnings of regular expressions come from topics known as automata theory and formal language theory, both parts of theoretical computer science. However, the practical day-to-day significance cannot be understated. Learn to write regular expressions, and you will be repaid in time savings for years to come.
About Countrywide Training:
Countrywide Training has a large selection of classes to get you certified! We offer classes and computer based training on MCPD .NET 4.0 or 3.5 development, SQL 2008 programming, and Oracle PL/SQL. Call us today and get your training today!
Links:
More Regular Expressions Information
MCPD Boot Camps
SQL 2008 Developer Boot Camp



