Regular Expression to validate email address /^\w+((-\w+)|(\.\w+))*\@[A-Za-z0-9]+((\.|-)[A-Za-z0-9]+)*\.[A-Za-z0-9]+$/
- The email regular expression begins with a /, representing the leftmost delimiter.
- Then, we have a ^ symbol, representing the absolute beginning of the string.
- The following \w+ matches one or more alphanumeric characters.
- The next chunk is where this gets interesting. I will try to break it down into manageable pieces, so please bear with me. The part we will look at is ((-\w+)|(\.\w+))*
- First, note that the whole thing is surrounded by ()* which means that we want to match zero or more of them. Inside the parentheses, we have (-\w+)|(\.\w+) which means to match EITHER -\w+ OR \.\w+ so lets take a look at each of them in turn. The first one indicates that we should have a match if we find a hyphen followed immediately by a set of alphanumeric characters. The second part matches if we find a period followed immediately by a set of alphanumeric characters. Remember that a period by itself is a special character so we must delimit it by placing a backslash in front of it. In essence, what this inside bit does is allow someone to submit an email address that has a hyphenated or dot-separated email address before an "at" sign.
- After this match, comes an @ sign. This is delimited to ensure that it isn't taken for special meaning.
- Immediately following the "at" sign is [A-Za-z0-9]+ which matches a set of alphanumeric characters (excluding any _ characters, which we would have got if we had just used \w).
- The final / is the rightmost delimiter for the regular expression.
- After this, we have another interesting bit ((\.|-)[A-Za-z0-9]+)*. Let's go through it.
- Again, note that we are matching one or none of a match using the * sign. Since parentheses are used, the entire match is taken into consideration. Let's look inside at the (\.|-)[A-Za-z0-9]+ pattern. Inside the parentheses, we have \.|- which implies that we will match either a period or a hyphen. Since this pattern is followed by a [A-Za-z0-9]+, the match only works if the period or hyphen is followed by a set of alphanumeric characters. This effectively represents an email address that contains a (possible) set of .word or -word sections. Because the * is used, the pattern works if they are present and also if they aren't.
- The last \.[A-Za-z0-9]+ pattern matches a period followed by a set of alphanumerics. Because it is the last part of the regular expression, it represents the final part of the email address, which is the top level domain. Because [A-Za-z0-9]+ does not match non-alphanumerics, this pattern will not match email addresses that do not contain some sort of "real-looking" domain.
- The final $ symbol ensures that the pattern is against the end of string.
This pattern allows for email addresses like the following. With this particular regular expression, the bare minimum that a person could enter as an email address is x@x.x, where x is any alphanumeric character:
someone@somewhere.com someone.somebody@somewhere.com someone.somebody@somewhere.where.com some-one@somewhere.com some-one.somewhere@wherever.com some-one.somewhere@where-ever.com
Last update: Wed, 2 Nov 2005 10:16:21 GMT | top |