As with most things, it’s pretty easy when you know how, so here’s my one-step-at-a-time approach to regex (stolen from my ZCE preparation tutorial slides). Let’s begin at the very beginning: regular expressions have delimiters, usually a slash character, and these contain a pattern that describes a string.
||Matches “bat”, “bet”, “bit”, “bot” and “but”
Also matches “cricket bat”, “bitter lemon”
Here, we’ve got an expression that describes something containing the letter “b”, followed by any one of the vowels (a,e,i,o and u), followed by the letter “t”; so long as those three items appear in a string, then it will match this expression.
Use a hypen to denote ranges of characters:
||Will match hex|
||Upper and lower case are distinct; this matches alphanumeric strings|
||The dot ‘.’ matches any character|
||If you actually want to match a dot, escape it|
These examples show matching a selection of characters, including character ranges. Look out for the wildcard, which is a dot – if you wanted to match a dot (for example in a domain name), you will need to escape it with the backslash.
The quantifier goes after the character (or character range) to say how many of something there should be. You can give precise numbers of occurrences, ranges, or use the
? (0 or 1),
+ (1 or more) or
* (0 or more) characters:
||Matches “bat” and “bit” etc, but also “boot” and “boat”|
We can also anchor our patterns to the beginning and ends of lines using
$ respectively (or
\Z for multiline strings). This means that:
||Will match “battering ram” but not “cricket bat”|
To remember all the various characters, I use this excellent regex cheat sheet from addedbytes.com, which has been printed out and pinned over my desk pretty much everywhere I’ve worked in recent memory.