Regular Expressions (Introducing)

https://xkcd.com/208/ - a comic showing regular expressions in action

Regular Expressions. I originally tried to grasp them whilst playing around with grep last November and they are extremely powerful, if not a little complicated. As I’m “upping my game” and expanding my knowledge I need to learn more about regular expressions, so I thought a blog post introducing them was way past due.

What are Regular Expressions?

Regular Expressions (or regex as they are sometimes shortened to) are characters that form a search pattern. Those characters can be alphabetical (both upper and/or lower), numerical or even special characters (e.g. ! ? , . ).

When are Regular Expressions Used?

If you have a large amount of data that you want to find certain strings in, e.g. like XKCD 208 suggests, then Regular Expressions can help look for character (alphabetical, numerical or special) patterns. Referring back to my blog post from last November, regular expressions can be used with Grep to search through files (e.g. possibly log files) to find certain character patterns.

Regular Expressions can also be used in programming languages such as Python and Java.

How Do Regular Expressions Work?

If we take the lines:

http://www.geektechstuff.com is great.

It is a website for advice and geek tech stuff.

We could use the regex /geektechstuff/ which would find that string of characters within the first line, I’ve high lightened this with orange text:

www.geektechstuff.com is great.

It is a website for advice and geek tech stuff.

The regex /geektechstuff/ is only looking for those exact characters. But what if /GeekTechStuff/ is used instead? It gets zero results as capital (upper case) letters have a different value to lower case letters. However, with the i flag we can ask regex to ignore the case of the string so /GeekTechStuff/i will find the orange text above.

What about if we want to find the word “geek” in the above text? /geek/ and /geek/i would only find 1 instance of “geek”, again I’ll point this out by changing then text orange:

www.geektechstuff.com is great.

It is a website for advice and geek tech stuff.

However it is clear above that the string “geek” appears twice. We need to use the /g regex flag to tell regex to keep searching for the string even after finding it. So /geek/g would have the following results:

www.geektechstuff.com is great.

It is a website for advice and geek tech stuff.

And if we wanted to ignore the case (upper / lower) of the character we could use both the /i and /g flags together e.g. /geek/ig which would find the same results in the above text.

https://xkcd.com/208/ - a comic showing regular expressions in action
https://xkcd.com/208/

What about if we wanted to find all the instances of the string “geek” and the string “tech”? We can use an OR gate, which is the | pipe symbol in regex. /geek|tech/g would find:

www.geektechstuff.com is great.

It is a website for advice and geek tech stuff.

If we wanted to search a for a string of text with a space in it we can use the \s option, so to look for “geek tech” we would use /geeks\stech/ which would find:

http://www.geektechstuff.com is great.

It is a website for advice and geek tech stuff.

Even with the /g and/or /i options it still would only find the above as regex is looking exactly for “geek tech”.

My next blog post on regular expressions will hopefully go into a bit more detail, as I believe it is easier to start using regular expressions a few small steps at a time.

More information on Regular Expressions can be found at https://en.wikipedia.org/wiki/Regular_expression

Want to test some Regular Expressions? Try an online assistant like https://www.regexpal.com or https://regexr.com