XPathOnUrl

=XPathOnUrl(
  string url, 
  string xpath, 
  string attribute (optional), 
  string xmlHttpSettings (optional)
) : vector

Fetches the url and returns the (vector) result from xpath expression.

This function is cached.

=XPathOnUrl("http://google.com","/html/head/title") 
=> "Google"

The optional attribute allows you to retreive the value of an attribute of the specified node. Xpath /foo/@bar returns the inner text of node foo. To get attribute bar you specify:

=XPathOnUrl(url,"/foo","bar").

You can use XPathOnUrl on any XML resource.

You can control the HTTP request (such as header and form variables) using the xmlHttpSettings. See HttpSettings.

Note that XPath queries has to be in lower case. So /HTML/BODY/A won't work.

See: Working with arrays in Excel

Similar to ImportXML in Google Spreadsheets

XPathOnUrl works very similar to the popular ImportXML function in Google spreadsheets, so for some inspiration of what you can achieve with XPathOnUrl:

http://www.seerinteractive.com/blog/importxml-cookbook/2011/09/17/

http://www.distilled.net/blog/distilled/guide-to-google-docs-importxml/

There’s a few differences though:

  • First you can’t reference attributes directly in the XPath (see above).
  • Secondly, ImportXML will populate the current cell and neighboring cells below with results if the XPath expression references multiple nodes. XPathOnUrl will return an Excel array. See: Working with arrays in Excel.

So to achieve what you in Google spreadsheet write:

=ImportXML("http://www.google.com/search?q=dogs","//h3[@class='r']/a/@href")

(Retreives the first 10 SERP url in a search for dogs on Google.com)

In Excel+SeoTools, you write:

=Dump(XPathOnUrl("http://www.google.com/search?q=dogs&num=10","//h3[@class='r']/a","href"))

Local files

You can also parse local files using XPathOnUrl using either absolute or relative (to spreadsheet) filepaths:

file:///C:/path/to/file.xml
file:///path/relative/workbook.xml
file:///../../path/relative/workbook.xml

Examples

Get country where website (or ip) is hosted

Using the GeoIPService from webservicex.net and ResolveIp we can easily get the country where a website is hosted.

=XPathOnUrl("http://www.webservicex.net/geoipservice.asmx/GetGeoIP?IPAddress="&ResolveIp("http://offerta.se"),"/geoip/countryname")
=>"Sweden"

Return number of nodes

Lets say you want to retreive a the number of links in a page, we can try:

=XPathOnUrl("http://nielsbosma.se","count(//a)")

But thi won’t work as HtmlAgilityPack requires expression to return nodes. But as XPathOnUrl return an array with results we can use the ROWS() function in Excel to achieve this:

=ROWS(XPathOnUrl("http://nielsbosma.se","//a"))

Alternative to Dump and Resize

=XPathOnUrl("http://www.google.com/search?q=dogs","(//h3[@class='r']/a)[1]","href")

This will reference the first node. To get all results in a column, you first create a column (in say column A) with values 1..10 and use the following formula:

=XPathOnUrl("http://www.google.com/search?q=dogs","(//h3[@class='r']/a)["&A1&"]","href")

Comments

comments powered by Disqus