Parsing XML docs with PHP SimpleXML is pretty straightforward. Yesterday i lost around 5 minutes to parse a Media RSS XML, and that was weird because normally with SimpleXML you take like 30 seconds… A Media RSS (MRSS) document is just a RSS with media extensions:
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<title>RSS Title</title>
<link>http://www.domain.com/mylink</link>
<description>My description</description>
<item>
<title>Title item 1</title>
<link>http://www.domain.com/item_1.html</link>
<description>Item 1 description</description>
<guid>http://www.domain.com/item_1.html</guid>
<media:content url="http://www.domain.com/item_1.jpg" height="240" width="320" />
</item>
<item>
<title>Title item 2</title>
<link>http://www.domain.com/item_2.html</link>
<description>Item 2 description</description>
<guid>http://www.domain.com/item_2.html</guid>
<media:content url="http://www.domain.com/item_2.jpg" height="240" width="320" />
</item>
.... etc
</channel>
</rss>
The “problem” is to access the media:content or the other media:* elements. But don’t worry I’m going to show you how to do it 🙂
$xml = simplexml_load_file('http://domain.com/mrss.xml');
$namespaces = $xml->getNamespaces(true); // get namespaces
// iterate items and store in an array of objects
$items = array();
foreach ($xml->channel->item as $item) {
$tmp = new stdClass();
$tmp->title = trim((string) $item->title);
$tmp->link = trim((string) $item->link);
// etc...
// now for the url in media:content
//
$tmp->media_url = trim((string)
$item->children($namespaces['media'])->content->attributes()->url);
// add parsed data to the array
$items[] = $tmp;
}
There, a piece of cake!
UPDATE
I received a comment about Picasa RSS feed, where you have to dig just a bit deeper, as the media:url is inside a media:group. The XML feed is as follows
<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:atom='http://www.w3.org/2005/Atom'
xmlns:media='http://search.yahoo.com/mrss/'
xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' version='2.0'>
<channel>
<atom:id>https://picasaweb.google.com/data/feed/base/user/103218581909188195000</atom:id>
<lastBuildDate>Wed, 16 Apr 2014 07:28:42 +0000</lastBuildDate>
<title>Galerie fotografií uživatele Jiřetín JINAK</title>
.... etc
<item>
<pubDate>Thu, 10 Apr 2014 07:16:22 +0000</pubDate>
<atom:updated>2014-04-16T07:28:42.202Z</atom:updated>
<author>Jiřetín JINAK</author>
.... etc
<media:group>
<media:content url='https://lh6.googleusercontent.com/-C6WmXjRnV8Y/U0ZFRnm-ujE/AAAAAAAAAPQ/AbwIc0Ycugk/s100-c/RizikovaMistaVHornimJiretine.jpg' type='image/jpeg' medium='image'/>
<media:credit>Jiřetín JINAK</media:credit>
<media:description type='plain'/>
<media:keywords/>
<media:thumbnail url='https://lh6.googleusercontent.com/-C6WmXjRnV8Y/U0ZFRnm-ujE/AAAAAAAAAPQ/AbwIc0Ycugk/s160-c/RizikovaMistaVHornimJiretine.jpg' height='160' width='160'/>
<media:title type='plain'>Riziková místa v Horním Jiřetíně</media:title>
</media:group>
</item>
.... etc
</channel>
</rss>
The PHP code follows the same logic, just add another step to take into account media:group
$xml = simplexml_load_file('http://picasaweb.google.com/data/feed/...&prettyprint=true'); $namespaces = $xml->getNamespaces(true); // get namespaces $items = array(); foreach ($xml->channel->item as $item) { $tmp = new stdClass(); $tmp->title = trim((string) $item->title); $tmp->link = trim((string) $item->link); // etc... // now for the data in the media:group // $media_group = $item->children($namespaces['media'])->group; $tmp->media_url = trim((string) $media_group->children($namespaces['media'])->content->attributes()->url); $tmp->media_credit = trim((string) $media_group->children($namespaces['media'])->credit); // etc // add parsed data to the array $items[] = $tmp; }
Hello I found your Post because I am trying to parse Picasaweb rss feed. But with no luck 🙁
I need media:content url but on this feed it is deeper under media:group
Could you give me example how to do that? I try your code but with no luck.
Hi Paul, can you post the feed URL?
https://picasaweb.google.com/data/feed/base/user/103218581909188195000?alt=rss&kind=album&hl=cs&access=public&imgmax=1600
this is better with prettyprint
https://picasaweb.google.com/data/feed/base/user/103218581909188195000?alt=rss&kind=album&hl=cs&access=public&imgmax=100&prettyprint=true
Hi Marco, Did you get the url? If you have time, it will help me alot. Thank you.
I only get it today. Will look into it this week.
Cheers
Hello,
i have just updated the post with an example parsing the Picasa feed,
Regards
Awesome, It’s making sense to me now.
$media_group->children($namespaces[‘media’])->{:tag}->attributes()->url
Found a second way, but the array of namespaces comes in handy. 🙂
$media_group->children(‘media’,true)->content->attributes()->url
this is very good post . this help me a lot .
This post helps a lot, thanks!
HUGE Thank you… I have tried everything and little did I know that I was totally missing the ‘group’ tag!! Thank you it solved my problem!!!!
Get
Warning: main(): Node no longer exists in C:\xampp\htdocs\project\_new\nyt\wireframe\_test\simplexml\index.php on line 17