Regular expressions can be very useful for validating and manipulating data. Unfortunately there’s no built in support for regular expressions in Excel. You can use VBA but I think that’s a bit too messy.
In my Excel plugin SeoTools, I’ve added four methods:
=RegexpIsMatch(string input; string expr)
Matches an input string with a regular expression and returns true if there’s a match.
=RegexpFind(string input; string expr; int group)
Takes a regular expression and retrieves the matched group from an input string.
=RegexpFindOnUrl(string url; string regexp; int group)
Same as RegexpFind but the input string is instead the contents of a webpage.
=RegexpReplace(string input; string expr; string replacement)
Within a specified input string, replaces all strings that match a specified regular expression with a specified replacement string.
All examples below assume that you have downloaded and added SeoTools to the document you’re working on.
Example1: Validating email addresses with RegexpIsMatch
Let’s say you have a column with a bunch of email addresses and you want to find all invalid addresses. The following formula will return FALSE for all invalid email addresses:
Example2: Find a particular part in a text
Now let’s say you have a column with text where you’d like to extract a particular part from, i.e. “Sports (id:4)”, where we like to extract the “4″ into a new column.
Example3: Get the number of pages indexed in Google for a set of keywords
For the last example we try some simple web scraping using the regular expression support in SeoTools. In our example we have a set of keywords and want to have the formula return the number of pages a search returns.
=RegexpFindOnUrl(“http://www.google.com/search?q=”&UrlEncode(A1);”About ([0-9,]*) results”;1)
With functions in SeoTools that are marked with “(cached)” (such RegexpFindOnUrl) the URL is only fetched once per “session”. So if you have several pieces you want to extract from a webpage the source of the page is only downloaded once even if you reference the URL several times in different formulas.