Skip to content

Instantly share code, notes, and snippets.

@harukaeru
Created November 4, 2018 11:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save harukaeru/6048361a11473b9678138e636e24de96 to your computer and use it in GitHub Desktop.
Save harukaeru/6048361a11473b9678138e636e24de96 to your computer and use it in GitHub Desktop.
Extract data as TSV from glossary in Weblio
var tdFunctions = {
0: td => td.getAttribute('title'),
1: td => td.innerHTML,
2: td => td.querySelector('.tngMainTIML').innerHTML,
3: td => td.querySelector('.tngMainTSRHB').textContent,
}
var trFilter = tr => {
return tr.classList.contains('tngMainTrOn');
}
var extract = (tBodyElement, tdFunctions, trFilter) => {
const arrayIndices = Object.keys(tdFunctions);
const lastArray = arrayIndices.length - 1;
let ret = "";
for (var i =0; i < tBodyElement.children.length; i++) {
var tr = tBodyElement.children[i];
if (trFilter(tr)) {
let row = "";
arrayIndices.forEach((i, ii) => {
if (ii == lastArray) {
row += tdFunctions[i](tr.children[i]) + "\n";
} else {
row += tdFunctions[i](tr.children[i]) + "\t";
}
})
ret += row;
}
}
console.log(ret);
}
extract(document.querySelector('#wordlist-main-content-table-body'), tdFunctions, trFilter);
@harukaeru
Copy link
Author

harukaeru commented Nov 4, 2018

The extracted data is used as the element in the following script, which generates an Anki-format TSV file.
https://gist.github.com/harukaeru/4d799d14e6c1a9c6c21683dcc8f99bb2

@harukaeru
Copy link
Author

harukaeru commented Nov 4, 2018

Here is a Demo:

After executing extract.js

image

After executing toAnki.py with generated text from extract.js

image

Then all you need is import the generated file.

@harukaeru
Copy link
Author

harukaeru commented Nov 5, 2018

Additional

Here is the sample file I indicated the above Demo. You can try downloading this and import it according to concise instructions of the following.
https://www.mediafire.com/file/h1v71mo919vm3pt/weblio2.tsv.converted.tsv/file

Instructions

  • Click the button Import File on Anki (You may not use this function on mobile apps, I recommend you prepare the Anki app on a PC or Mac.)

  • The following window will be coming up. It may be useful if you choose a Deck. I suggest you create a new Deck. (I created Sample Deck in this example.)
    image

  • After imported, enjoy it!
    image
    image

@kokuren333
Copy link

何度実行しても
Traceback (most recent call last):
File "toAnki.py", line 16, in
pronunciation = array_line[1]
IndexError: list index out of range
と出てしまうのですが何か解決策はあるでしょうか?
教えていただけると幸いです

@kokuren333
Copy link

すいません、解決しました
この表示がでてもちゃんとtsvファイルが変換されるみたいですね
とても便利なのでこれからも使わさせていただきます!
ありがとうございました!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment