Deciphering JS Regular Expressions

Noah Eakin
6 min readFeb 8, 2021

--

If you have ever been tasked with cleaning up capitalization or filtering for special characters in strings, chances are you have encountered regular expressions. If you dove into the documentation it probably wasn’t long before you were greeted with mind-melting expressions that looked like this:

/[\w.%+]+@[\w.-]+\.[a-zA-Z]{2,10}/

Because regular expressions allow for so much customization and specificity, it can result in very complex, convoluted-looking syntax. This guide looks to untangle some of these concepts and illuminate the basic principles that guide more involved expressions like the one above.

What is a regular expression?

A regular expression is an object in JavaScript used to identify specific character sequences. This allows us to search through text in a very advanced way and use functions to find and replace, validate, and more. I will not be covering these functions here but will instead focus on the syntax used in regular expressions so that you can read and understand what is being targeted.

I want to quickly point out that there are two formats used to denote regular expressions. One is with a regular expression literal:

let example = /abc/

The other utilizes the constructor function of the RegExp object in Javascript:

let example = new RegExp('abc')

I will only use regular expression literals in this post for simplicity’s sake and because it is the format you are more likely to encounter.

Between Two Slashes

We’ll get through RegEx together

The meat and potatoes of the regular expression lies between two forward slashes. At its most basic, the expression will do a search for the exact pattern of characters expressed between these slashes, so /own/ would comb through a string like "how now brown cow" and find own contained in the word brown. Simple enough.

Flags

After the closing / in a regular expression you can add flags which will further modify your search. I am only going to cover the two most common flags although there are six altogether.

  • g —global flag specifies that we want to return all matches to our regular expression. Without any flags, a regular expression will only return the first case that matches even if there are multiple cases in a string
  • i — case insensitive specifies that we want to disregard capitalization when looking for matches to our regular expression. For instance /bRoWn/i called on "how now brown cow" would still return brown whereas you would get a null object without this flag

Special Characters and Escaping

Just as JavaScript has reserved words like class and return, regular expressions attach special functionality to certain characters. Here are a few of the most common:

  • + — match 1 or more of the preceding character. So, /me+/g applied to "meet me in the middle" would match mee from meet as well as the word me that follows. It finds all instances where at least one e immediately follows an m, but if there are more e's it specifies to keep those in there as well
  • ? — the preceding character is optional to match. In /me?/g applied to "meet me in the middle" it’s looking for all of the m's and if an m is followed by an e throw that in there as well but it’s not a dealbreaker
  • * — match 0 or more of the preceding character. This basically splits the difference between our + and ? special characters — like the ? it’s specifying that it’s not a dealbreaker if the previous character is not in the sequence, but similar to the + if it is there it will include as many as are in a row. So, /lo*/g applied to "helloooooo world" would match the first l in helloooooo, then the second land all the o’s that follow (loooooo), and finally the l of world
  • . — matches any character except line breaks (including whitespace). /.ow/gi applied to "how now brown cow" would match how, now, row, and cow
  • | — matches any character before or after the pipe. So, /o|w/g would match all of the individual o's and w's in "how now brown cow"
  • {} — takes either a single argument (a number) and matches the preceding character that many times, or takes two arguments (both numbers) to establish a minimum and maximum number of times for the previous character. So:
  • \ — a backslash used before a special character “escapes” the special character. This is useful when you want to search for the literal of a special character, like if you wanted to find all of the periods in a block of text. You can use /\./g to escape the special properties of . within a regular expression. Note that if you want to search for the literal of the special character \ you still escape it with \, so /\\/g

Groupings and Ranges

Hang in there!

The special characters [] are getting their own section because of the additional options they give us.

First, brackets allow us to group characters together — basically a catch-all bucket. One simple application would be /[hn]ow/g applied to "how now brown cow" which would give us how and now. It’s like everything in between the brackets is separated by the | special character, so any h or n followed by an ow will match our regular expression.

Another thing we can do with brackets is include a range of characters such as /[a-z]/g which will match any and all letters a through z. We could be more selective with /[d-f]/. We could target only capital letters with /[A-Z]/. We could search for numerals with /[0-9]/. Or we could incorporate multiple ranges like /[a-zA-Z0-9]/g. Powerful indeed.

One last note with bracketed character sets like this is that special characters are treated as literals within the brackets. So, /[+%]/g is looking for literal instances of + and %. Special characters outside of the brackets can still operate on the character sets, as in /[io]+/g.

Word Characters

The last concept I want to talk about briefly is JavaScript’s word character, which is any alphanumeric character or an _. We can search for word characters with regular expressions using /\w/. This is the same as writing /[a-zA-Z0-9_]/. You can see how concise \w is in comparison which helps when dealing with longer, more complex expressions.

Putting It All Together

Now that we are comfortable with the fundamentals, let’s return to the first regular expression of this blog post and see if we can decipher its intent:

/[\w.%+]+@[\w.-]+\.[a-zA-Z]{2,10}/

First, we don’t see any flags so we know that we will only match the first instance of this expression even if more exist. If there are no matches we would get a null object.

Next, we examine what is between our regular expression forward slashes. Going left to right, we see brackets followed by a + so we are searching for at least one but maybe more word characters, .'s, %'s, and +'s.

Continuing right, we want the above to precede an @. Then we see another bracketed character set which, like our first character set, is followed by a + special character. This time we are looking for at least one but maybe more word characters, .'s, and -'s.

Next, we have an interesting wrinkle — a backslash followed by a period. We’re using \ to cancel out of the special character ., so in this case we are looking for everything we’ve matched so far to precede a literal period.

Lastly, we have a bracketed character set that includes 2 ranges: a-z and A-Z. So we are looking for any letter, uppercase or lowercase. This is followed by special characters {} with two arguments — a minimum and maximum quantity of our bracketed set.

Now, looking at all of the pieces we’ve assembled can you guess what this regular expression is meant to find?

Final Thoughts

If regular expressions still seem scary and bewildering, I highly recommend going hands-on to further experiment with these concepts. Hopefully, this guide at least shines some light on the basics and you are now able to understand what the more common expressions are communicating.

--

--

Recommended from Medium

Lists

See more recommendations