Skip to content

Instantly share code, notes, and snippets.

@rergw
Last active October 1, 2018 04:16
Show Gist options
  • Save rergw/499b00adf329cfed39178bc9bc1bd8ef to your computer and use it in GitHub Desktop.
Save rergw/499b00adf329cfed39178bc9bc1bd8ef to your computer and use it in GitHub Desktop.

Import/Export for Scraper Chrome Extension

These files are a way to import and export fields entered in Chrome Scraper extension:

https://chrome.google.com/webstore/detail/scraper/mbigbapnjcgaffohmbkdlecaccepngjd?hl=en

Even when Presets option allows to save all fields, it is not apparent how to share presets between different installations, this is then a very rough attempt.

After exploring local storage for extension I saw that presets are stored there, so they can be copied and merged as a JSON would be merged:

https://drive.google.com/file/d/1S-kDsoFP4preBD_n265GesKapOctMAII/view?usp=sharing

Or faster:

https://i.imgur.com/kHsupfc.png

/*
Paste code in console of scraper to extract to clipboard xpaths and attribute names from scraper:
https://chrome.google.com/webstore/detail/scraper/mbigbapnjcgaffohmbkdlecaccepngjd?hl=en
Usage:
1. Paste this code in scraper console
2. Hit enter
3. Clipboard content can be pasted in importer fields variable.
*/
a=[]
$('#options-attributes tr').map(function(i,e){
is=$(e).find('input')
if(is[0]) a.push([is[1].value, is[0].value])
})
// I could not use replacer to format it
// I could :)
/*
eval(JSON.stringify(a,function(k,v){
if(Array.isArray(v) && typeof v[0] == 'string')
return "\n"+JSON.stringify(v)+"\n"
return v
}))
*/
a=JSON.stringify(a)
.replace(/\[\[/,'[\n [')
.replace(/\],\[/g,'],\n [')
.replace(/\]\]/,']\n]')
copy(a)
/*
Usage:
1. Paste contents from export step to https://shancarter.github.io/mr-data-converter/
2. Replace fields with that output
3. Replace selector manually. Optional.
4. Paste this code in scraper console
5. Hit enter
6. All fields should be populated, save it as a preset
Note if data converter goes offline use:
https://web.archive.org/web/20180628080610/https://shancarter.github.io/mr-data-converter/
*/
// Data
selector = '//*[@id="content"]/div[2]/div[2]/div[2]/div/ul/li/article'
// Convert excel to JS with https://shancarter.github.io/mr-data-converter/
// Or use papa parse https://www.papaparse.com/
fields=[
["tags","./section[1]/div/div/span[1]/text()[2]"],
["title","./section[2]/div/h3/a"],
["URL","./section[2]/div/h3/a/@href"],
["author","./section[2]/div/div/a"],
["author URL","./section[2]/div/div/a/@href"],
["type","./section[2]/div/span"],
["feat1","./section[4]/ul/li[1]"],
["feat2","./section[4]/ul/li[2]"],
["feat3","./section[4]/ul/li[3]"],
["price","./section[3]/span[1]/div"],
["rating","./section[3]/span[2]/div/div/@aria-label"],
["reviews","./section[3]/span[2]/div/span/text()[2]"],
["sales","./section[3]/span[3]/span/text()[1]"],
["updated","./section[3]/span[4]/text()[2]"]
]
// Data END
// Helpers
addField = function(field){
a=$('#options-attributes > tbody > tr:last > td:nth-child(4) > a:nth-child(2)')
name = field[0]
xpath = field[1]
a.click()
x=$('#options-attributes > tbody > tr:last input[name="attributes[][xpath]"]')
n=$('#options-attributes > tbody > tr:last input[name="attributes[][name]"]')
x.val(xpath)
n.val(name)
}
// Helpers END
// Clean state
// Better reload with CTRL + R
// dels = $('#options-attributes > tbody > tr > td:nth-child(4) > a:nth-child(1)')
// dels.click()
// Clean state END
// Import
$('input[name="selector"]').val(selector)
for (var i = 0; i < fields.length; i++) { addField(fields[i]) }
// Import END
// Remove first button
del = $('#options-attributes > tbody > tr:nth-child(1) > td:nth-child(4) > a:nth-child(1)')
del.click()
// Remove first button END
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment