Skip to content

Instantly share code, notes, and snippets.

@EdwardIII
Created November 9, 2012 16:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save EdwardIII/4046498 to your computer and use it in GitHub Desktop.
Save EdwardIII/4046498 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from lxml import etree
import urllib2
xml = u"""<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<title>...</title>
<description>...</description>
<link>http://www.example.com/</link>
<item>
<title>
<![CDATA[ Armani Idole d'Armani 50ml Eau de Parfum Spray ]]>
</title>
<link>
<![CDATA[
http://www.example.com/Armani_ Idole d-Armani 50ml EDP Spray ?language=en&currency=GBP
]]>
</link>
<description>
<![CDATA[
Armani Idole d Armani 50ml Eau de Parfum Spray Demonstration Bottle, Plain Box, Unused RSP £50.55 LOVESCENTS SPECIAL PRICE....£33.99 Idole d’Armani is a spicy floral developed by perfumer Bruno Jovanovic and features notes of clementine, pear, ginger, davana, saffron, jasmine, loukoum rose, patchouli and vetiver.
]]>
</description>
<g:brand>
<![CDATA[ GIORGIO ARMANI ]]>
</g:brand>
<g:condition>new</g:condition>
<g:id>
<![CDATA[ 353_gb ]]>
</g:id>
<g:image_link>
<![CDATA[
http://www.example.com/image/cache/data/Idole%20d'Armani-500x500.jpg
]]>
</g:image_link>
<g:availability>in stock</g:availability>
<g:mpn>
<![CDATA[ GAIDOLEW001 ]]>
</g:mpn>
<g:price>34.99 GBP</g:price>
<g:product_type>
<![CDATA[ Womens Fragrances ]]>
</g:product_type>
<g:product_type>
<![CDATA[ Womens Fragrances &gt; Eau de Toilette ]]>
</g:product_type>
<g:product_type>
<![CDATA[ Womens Fragrances &gt; Perfumes ]]>
</g:product_type>
<g:product_type>
<![CDATA[ Womens - Best Sellers ]]>
</g:product_type>
<g:shipping_weight>200.00kg</g:shipping_weight>
<g:google_product_category>
<![CDATA[
Health &amp; Beauty &gt; Personal Care &gt; Cosmetics &gt; Perfume &amp; Cologne
]]>
</g:google_product_category>
<g:adwords_publish>
<![CDATA[ true ]]>
</g:adwords_publish>
</item>
</channel>
</rss>"""
NS = 'http://base.google.com/ns/1.0'
NSMAP = {'g': NS }
tree = etree.fromstring(xml)
items = tree.findall('.//rss/channel/g:brand',namespaces=NSMAP)
for item in items:
print item.text.strip()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment