Skip to content

Instantly share code, notes, and snippets.

@wppurking
Last active August 29, 2015 13:57
Show Gist options
  • Save wppurking/9583555 to your computer and use it in GitHub Desktop.
Save wppurking/9583555 to your computer and use it in GitHub Desktop.
require 'nokogiri'
html = %q(
<td class="data-display-field" width="60%">
<a href="https://sellercentral.amazon.co.uk/gp/orders-v2/contact?ie=UTF8&amp;buyerID=ABAZTC59A7Z12&amp;orderID=205-7265948-3023534">Syed Ali</a>
<span id="_myo_buyerEmail_progressIndicator" style="vertical-align: middle; display: none;">
<img src="https://images-na.ssl-images-amazon.com/images/G/02/rainier/ajax/snake._V192262569_.gif" id="_myo_buyerEmail_loadingBar" style="display:inline">
</span>
<b id="_myo_buyerEmail_showRepeatOrders" buyeremail="hbbxb99wq139140@marketplace.amazon.co.uk" class="tiny"> </b>
</td>)
doc = Nokogiri::HTML(html)
# 使用 at_css 其实就是 css('xxx').first 取第一个元素. at_css 返回值可能为 nil, 但 css 返回值一定为 NodeSet 只是 size 可能为 0
span = doc.at_css('#_myo_buyerEmail_progressIndicator')
# 我想拿 buyeremail, 使用 next_element
b = span.next_element['buyeremail'] # hbbxb99wq139140@marketplace.amazon.co.uk
# 拿到空白节点, 因为在 空白, 换行也是一个节点, 类型为 Text. 而 next_element 则会跳过这些 Text 空白节点.
blank = span.next # #(Text "\n ")
# 当然也可以手动走过空白节点
span.next.next['buyeremail'] == span.next_element['buyeremail'] # true
# 查看父节点
img = doc.at_css('img')
img.parent['id'] # _myo_buyerEmail_progressIndicator
# 更多相关 API 可以查看 https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet#working-with-a-nokogirixmlnode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment