We've previously touched on the subject of scraping the web in the context of fetching a very specific part from a single page. But what if you're looking to analyse a substantial part of, or an entire website?

Enter the Spider

The Spider is an intelligent crawler built right into SeoTools. It lets you run one or several SeoTools functions against an entire website, including all public sub pages and external links.

In this example we're going to do some basic competition analysis to see how we can improve our SERP game against a particular competitor. Using the Google Keyword planner I've identified a couple of relevant keywords with relatively little competition for its search volume. Currently ranking #1 for most of these keywords is tools.seobook.com. We've found our nemesis!

I'm going to start by scraping the website I want to improve, the results of which we'll later compare to our competitor.

These are the bare minimum data points I suggest you include in your crawl:

Note: I'm using Moz and Majestic in this example, you can use another service that provides similar data.

Once you've finished scraping your website, create a new sheet and crawl the website of your competitor using identical parameters. You should now have two sheets with raw data.

Let's compare

You now have two sets of data that you can compare in any way imaginable using Excels built in functions. For demonstration purposes, I did an extremely basic comparison of MetaTitles characteristics.

Here are some useful formulas to get you started with comparing your own data. Replace range with your cells/column.

Average amount of characters across cells

=AVERAGE(LEN(range))

Instances of string across cells

=COUNTIF(range;"*text*")

Average across cells

=AVERAGE(range)

Ratio of two numbers

=cell1/GCD(num1, num2)&":"&cell2/GCD(num1,num2)