Simple Regular Expressions by Example

Whenever I ask a group of developers if they are familiar with regular expressions, I seem to get at least half the responses along the lines of “I’ve used them, but I don’t like them”. Call me a geek if you like, but I quite like regex; I think often it seems unfriendly because it’s used inappropriately or just thrown into code with “here be dragons” type comments rather than documentation about what should match, and what shouldn’t!

As with most things, it’s pretty easy when you know how, so here’s my one-step-at-a-time approach to regex (stolen from my ZCE preparation tutorial slides). Let’s begin at the very beginning: regular expressions have delimiters, usually a slash character, and these contain a pattern that describes a string.

pattern	notes
`/b[aeiou]t/`	Matches “bat”, “bet”, “bit”, “bot” and “but” Also matches “cricket bat”, “bitter lemon”

Here, we’ve got an expression that describes something containing the letter “b”, followed by any one of the vowels (a,e,i,o and u), followed by the letter “t”; so long as those three items appear in a string, then it will match this expression.

Use a hypen to denote ranges of characters:

pattern	notes
`/[0-9a-f]*/`	Will match hex
`/[0-9a-zA-Z]/`	Upper and lower case are distinct; this matches alphanumeric strings
`/./`	The dot ‘.’ matches any character
`/\./`	If you actually want to match a dot, escape it

These examples show matching a selection of characters, including character ranges. Look out for the wildcard, which is a dot – if you wanted to match a dot (for example in a domain name), you will need to escape it with the backslash.

The quantifier goes after the character (or character range) to say how many of something there should be. You can give precise numbers of occurrences, ranges, or use the ? (0 or 1), + (1 or more) or * (0 or more) characters:

pattern	notes
`/b[aeiou]+t/`	Matches “bat” and “bit” etc, but also “boot” and “boat”

We can also anchor our patterns to the beginning and ends of lines using ^ and $ respectively (or \A and \Z for multiline strings). This means that:

pattern	notes
`/^b[aeiou]t/`	Will match “battering ram” but not “cricket bat”

To remember all the various characters, I use this excellent regex cheat sheet from addedbytes.com, which has been printed out and pinned over my desk pretty much everywhere I’ve worked in recent memory.

14 thoughts on “Simple Regular Expressions by Example”

Holy Cow!

This is the first time I’ve read something about regex that makes any sense…

Thank you so much!

Reply ↓

I’ve always used http://regex.powertoy.org/ as a cheat sheet. It’s great for writing and testing regex, and includes a cheat sheet of every regex command.

I just checked out the regex cheat sheet you posted as well, and now I know why so many websites tell me my e-mail address isn’t valid. It doesn’t allow a hyphen in a domain name, even though that’s a valid character.

Reply ↓

Dennis on December 30, 2017 at 15:29 said:

http://regex.powertoy.org/ seems to have dropped off the web.
a. has it changed identity?
b. do I need my glasses checked?
c. Other

Reply ↓

Regular expressions are probably the single most useful thing a developer can know.

Not just for coding things, but also for doing otherwise-impossible search and replace or reformatting operations in text editors. Want to build some SQL from a data in a CSV or text file? No problem, run it through “sed” with some appropriate regex. Got a mixture of hard and soft tabs? No problem, “s/ {4}/\t/g” will fix them.

Reply ↓

lornajane on December 9, 2011 at 17:07 said:

Love the tabs/spaces example, I certainly use that one a lot!

Reply ↓

Awesome post Lorna :)
If anyone is interested in getting to grips with regular expressions the book ‘Sams Teach Yourself Regular Expressions in 10 Minutes’ is well worth a read. It fits into your back pocket and is nice and cheap too!
Here is the Amazon link:
http://www.amazon.co.uk/Teach-Yourself-Regular-Expressions-Minutes/dp/0672325667

Reply ↓

lornajane on December 9, 2011 at 17:06 said:

Thanks for the kind words and the very helpful book recommendation, that’s excellent :)

Reply ↓

Something worth noting for those PHP developers out there; remember that you need to escape backslashes in PHP.

So if you see:

preg_match(‘/^([a-z]+)\\.html/’, $subject, $matches);

The regular expression that this actually uses is:
/^([a-z]+)\.html/

As the first \ is a PHP escape character.

Always gets me…

Reply ↓

Reg X. on December 10, 2011 at 23:00 said:

A less obvious (but sometimes useful) way to escape regexp metacharacters is using the character class construct. For the example above, you could instead do this: /^([a-z]+)[.]html/

Reply ↓

I didn’t use regular expressions much until I found out:
1. how to use them inside Bash
2. how to use sub-patterns (the bits with parenthesis round them).
The latter are invaluable for writing simple fuzzy parsers, renaming lots of files that don’t sort properly …

Reply ↓

Am I reading this right or does /b[aeiou]+t/ just not work as intended till you remove the forwardslashes ?

Reply ↓

Holy Cow!

This is the first time I’ve read something about regex that makes any sense…

Thank you so much!

Reply ↓

Looks like that great regex cheat sheet you mentioned has been moved from addedbytes.com to cheatography.com and can now be found here: https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Reply ↓

[code]/b[aeiou]t/[/code] As far as I can see this does not match cricket bat, only bat unless other regex is added to extend it. I am only learning but see this as incorrect. I expected from the description that because the agreement was in the text that the whole phrase to match would be made. It will okay search of course but not pull up the whole thing.

Reply ↓

LornaJane Blog

Simple Regular Expressions by Example

14 thoughts on “Simple Regular Expressions by Example”

Leave a Reply Cancel reply