Skip to content

Instantly share code, notes, and snippets.

@kanemu
Created June 16, 2010 05:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kanemu/440219 to your computer and use it in GitHub Desktop.
Save kanemu/440219 to your computer and use it in GitHub Desktop.
[groovy]NekoMTMLでテーブルを読み出し
@Grab(group='net.sourceforge.nekohtml', module='nekohtml', version='1.9.14')
import org.cyberneko.html.parsers.SAXParser
def text="""
<HTML>
<table>
<body>
<tr>
<th>サイズ</th>
<th>S</th>
<th>M</th>
<th>L</th>
<th>XL</th>
</tr>
<tr>
<td>適応サイズ(cm)</td>
<td>155〜165</td>
<td>165〜175</td>
<td>170〜180</td>
<td>175〜185</td>
</tr>
<tr>
<td>着丈(cm)</td>
<td>66</td>
<td>69</td>
<td>72</td>
<td>74</td>
</tr>
<tr>
<td>身幅(cm)</td>
<td>45</td>
<td>47</td>
<td>50</td>
<td>53</td>
</tr>
<tr>
<td>袖丈(cm)</td>
<td>19</td>
<td>19</td>
<td>20</td>
<td>20</td>
</tr>
<tr>
<td>サイズNO.</td>
<td>02</td>
<td>03</td>
<td>04</td>
<td>05</td>
</tr>
<tr>
<td>プリントサイズÅ 縦×横(cm)</td>
<td>30×36</td>
<td>32×38</td>
<td>34×40</td>
<td>38×44</td>
</tr>
<tr>
<td>プリントサイズB 縦×横(cm)</td>
<td>50×27</td>
<td>54×29</td>
<td>56×33</td>
<td>57×34</td>
</tr>
<tr>
<td>プリントサイズC 縦×横(cm)</td>
<td>8×13</td>
<td>8×13</td>
<td>8×13</td>
<td>10×14</td>
</tr>
</table>
</body>
</HTML>
"""
def parser=new SAXParser()
//http://nekohtml.sourceforge.net/settings.htmlを参考に
parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment",true)
def html=new XmlSlurper(parser).parseText(text)
def rows = html.'**'.findAll { it.name() == "TR" }
def csv = []
rows.each{
csv.push(it.children().collect{ it.text() })
}
println csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment