Apache Awk Bash C cftp daemontools DHCP djbdns DNS Emacs Email ezmlm Fetchmail find GDB Hardware HTML HTTP Intro ISDN less Make Math mc mirrordir MySQL Peripherals Perl PHP3 pppd qmail Regexps Shell System Tables test To do Typical ucspi-tcp Versions Index TCP/IP slides
 

Regular expression examples

Regular Expression to validate email address

/^\w+((-\w+)|(\.\w+))*\@[A-Za-z0-9]+((\.|-)[A-Za-z0-9]+)*\.[A-Za-z0-9]+$/

  1. The email regular expression begins with a /, representing the leftmost delimiter.
  2. Then, we have a ^ symbol, representing the absolute beginning of the string.
  3. The following \w+ matches one or more alphanumeric characters.
  4. The next chunk is where this gets interesting. I will try to break it down into manageable pieces, so please bear with me. The part we will look at is ((-\w+)|(\.\w+))*
    1. First, note that the whole thing is surrounded by ()* which means that we want to match zero or more of them. Inside the parentheses, we have (-\w+)|(\.\w+) which means to match EITHER -\w+ OR \.\w+ so lets take a look at each of them in turn. The first one indicates that we should have a match if we find a hyphen followed immediately by a set of alphanumeric characters. The second part matches if we find a period followed immediately by a set of alphanumeric characters. Remember that a period by itself is a special character so we must delimit it by placing a backslash in front of it. In essence, what this inside bit does is allow someone to submit an email address that has a hyphenated or dot-separated email address before an "at" sign.
    2. After this match, comes an @ sign. This is delimited to ensure that it isn't taken for special meaning.
    3. Immediately following the "at" sign is [A-Za-z0-9]+ which matches a set of alphanumeric characters (excluding any _ characters, which we would have got if we had just used \w).
    4. The final / is the rightmost delimiter for the regular expression.
  5. After this, we have another interesting bit ((\.|-)[A-Za-z0-9]+)*. Let's go through it.
  6. Again, note that we are matching one or none of a match using the * sign. Since parentheses are used, the entire match is taken into consideration. Let's look inside at the (\.|-)[A-Za-z0-9]+ pattern. Inside the parentheses, we have \.|- which implies that we will match either a period or a hyphen. Since this pattern is followed by a [A-Za-z0-9]+, the match only works if the period or hyphen is followed by a set of alphanumeric characters. This effectively represents an email address that contains a (possible) set of .word or -word sections. Because the * is used, the pattern works if they are present and also if they aren't.
  7. The last \.[A-Za-z0-9]+ pattern matches a period followed by a set of alphanumerics. Because it is the last part of the regular expression, it represents the final part of the email address, which is the top level domain. Because [A-Za-z0-9]+ does not match non-alphanumerics, this pattern will not match email addresses that do not contain some sort of "real-looking" domain.
  8. The final $ symbol ensures that the pattern is against the end of string.

This pattern allows for email addresses like the following. With this particular regular expression, the bare minimum that a person could enter as an email address is x@x.x, where x is any alphanumeric character:

someone@somewhere.com
someone.somebody@somewhere.com
someone.somebody@somewhere.where.com
some-one@somewhere.com
some-one.somewhere@wherever.com
some-one.somewhere@where-ever.com
Last update: Wed, 2 Nov 2005 10:16:21 GMT