Fetching Namespaced XML Elements With SimpleXML
I was reading from their analytics data feed API, this returns a few key fields and then multiple <entry> tags, each with namespaced children. The entry tags look something like:
http://www.google.com/analytics/feeds/data?ids=ga:578671&ga:date=20101005&start-date=2010-10-01&end-date=2010-10-10
2010-10-09T17:00:00.001-07:00
ga:date=20101005
While you can access the non-namespaced elements either by addressing them directly or by iterating over the parent, the namespaced ones remain determinedly invisible, until you call asXML() on the node again. However we can retrieve them easily if we specify their namespace.
Retrieving Namespaces
When I get the response from my request to the API, I simply load the resulting string as a SimpleXMLElement. To get the namespaces, I simply call:
$namespaces = $xml->getNamespaces(true);
And I get an array with prefixes and full URLS, looking something like this:
Array
(
[] => http://www.w3.org/2005/Atom
[gd] => http://schemas.google.com/g/2005
[openSearch] => http://a9.com/-/spec/opensearch/1.1/
[dxp] => http://schemas.google.com/analytics/2009
)
Retrieving Namespaced Elements
We can use the array we found from getNamespaces() to make it easier to find the elements we are interested in – instead of specifying the URLs, we can just refer to the prefix we see in the raw XML. So for me to fetch those dxp: namespaced elements, I can take my entry node (called $entry in this example) and fetch them by passing the namespace I want to children().
foreach($entry->children($namespaces['dxp']) as $child) {
// now do your Cool Stuff (TM)
}
I was confused at first because I expected to be able to do this from the top level, or to refer to elements directly by name, since I can see perfectly well what they are called in the XML! However it seems like this is the best way to approach this – iterating over children from this namespace.
Comments, suggestions and improvements all gratefully received.
The example code seems to be reliant on the XML returned having ‘dxp’ as the namespace qualifier. So I think this approach taken may fail if either:
a) The namespace qualifier is changed to something other than ‘dxp’
b) No namespace qualifiers are used at all, e.g. all the tags have the xmlns explicitly set.
However SimpleXML seems to allow you to assert that ‘dxp’ is the namespace qualifier by using registerXPathNamespace() so that you can use XPath to find the nodes you want.
SimpleXML is all very well for PHP devotees. I personally I prefer to use DOM, as it’s well thought out cross-language API. DOM supports useful functions like getElementsByTagNameNS() which does the job for you.
Dom: SimpleXML is a bit of a simple wrapper for DOM stuff, but it does make XML handling approachable. Thanks for your tips – how worried should I be about services changing namespace qualifiers?
I wouldn’t worry too much about changing namespaces…consistency is the name of the game when it comes to working with XML, right? You can’t correctly parse the document without knowing what it’ll look like. I’ll agree with Dom, though – if you get much more outside of what you’ve written up here, you’ll be in trouble.
Of course, that’s usually how it goes with SimpleXML…
Lorna, thanks for the tip. One thing you might note when using Xpath with namespaces you need to $xml->registerXPathNamespace(‘dxp’, ‘http://schemas.google.com/analytics/2009’);
That way you can easily do an xpath query to find exactly what you want.
@chris: You’re wrong. XML relies on URIs and you often see ns0,ns1,ns2 in autogenerated XML. Using URIs turns out to be a totally retarded decision if everybody wants to use nicknames anyway. But that’s the way it is.
@lorna: Instead of getNameSpaces() I’d personally set up a predefined array. You can define your own qualifiers then and the XMLNS URIs don’t change anyway.
[geshi lang=php]
$xmlns = array(“”=>”http://www.w3.org/2005/Atom”, “os”=>”http://a9.com/-/spec/opensearch/1.1/”);
[/geshi]
$entry->children($xmlns[‘os’])
Thanks to everyone that recommended I look more closely at XPath for doing this – I wanted to show the simplest approach (which to my mind is always SimpleXML) but I’ve adapted my own code now to use XPath. I do love it when I learn something from my own blog :)
Mario: Thanks so much for this explanation! The URIs won’t change but I should register them explicitly … that feels more comfortable than relying on the abbreviation as I did in my original example.
For what it’s worth, you could also (as of PHP 5.2.0) forego the use of
getNamespaces()
and instead tellchildren()
to look for a matching prefix by setting the 2nd parameter to true:$entry->children('dxp', true)
Very useful, thanks!
Thank you! Someone who can actually make something plain and simple for a change instead of assuming heaps of knowledge. So grateful.