Skip to content

Instantly share code, notes, and snippets.

@flovv
Last active September 25, 2020 05:10
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save flovv/63e79a3149729b57d0397bb22a589856 to your computer and use it in GitHub Desktop.
Save flovv/63e79a3149729b57d0397bb22a589856 to your computer and use it in GitHub Desktop.
scrapeGoogleImages_file1
var url ='https://www.google.de/search?q=Yahoo+logo&source=lnms&tbm=isch&sa=X';
var page = new WebPage()
var fs = require('fs');
var vWidth = 1080;
var vHeight = 1920;
page.viewportSize = {
width: vWidth ,
height: vHeight
};
//Scroll throu!
var s = 0;
var sBase = page.evaluate(function () { return document.body.scrollHeight; });
page.scrollPosition = {
top: sBase,
left: 0
};
function sc() {
var sBase2 = page.evaluate(function () { return document.body.scrollHeight; });
if (sBase2 != sBase) {
sBase = sBase2;
}
if (s> sBase) {
page.viewportSize = {width: vWidth, height: vHeight};
return;
}
page.scrollPosition = {
top: s,
left: 0
};
page.viewportSize = {width: vWidth, height: s};
s += Math.min(sBase/20,400);
setTimeout(sc, 110);
}
function just_wait() {
setTimeout(function() {
fs.write('1.html', page.content, 'w');
phantom.exit();
}, 2500);
}
page.open(url, function (status) {
sc();
just_wait();
});
library(plyr)
library(reshape2)
require(rvest)
scrapeJSSite <- function(searchTerm){
url <- paste0("https://www.google.de/search?q=",searchTerm, "&source=lnms&tbm=isch&sa=X")
lines <- readLines("imageScrape.js")
lines[1] <- paste0("var url ='", url ,"';")
writeLines(lines, "imageScrape.js")
## Download website
system("phantomjs imageScrape.js")
pg <- read_html("1.html")
files <- pg %>% html_nodes("img") %>% html_attr("src")
df <- data.frame(images=files, search=searchTerm)
return(df)
}
downloadImages <- function(files, brand, outPath="images"){
for(i in 1:length(files)){
download.file(files[i], destfile = paste0(outPath, "/", brand, "_", i, ".jpg"), mode = 'wb')
}
}
### exchange the search terms here!
gg <- scrapeJSSite(searchTerm = "Adidas+logo")
downloadImages(as.character(gg$images), i)
@geotheory
Copy link

34: downloadImages(as.character(gg$images), 'yahoo')

@andreaangeli
Copy link

I run your code but it returns this error:
Error in paste0(outPath, "/", brand, "_", i, ".jpg") :
object 'i' not found

@LucaWRGF
Copy link

LucaWRGF commented Jul 19, 2017

@andreaangeli, went good for me like this, hope it can help :
line 25 to 34 in scrapeGoogleImages.r
`
#"outPath" has to be adapt !
downloadImages <- function(files, brand, outPath="D://scrape_images//brand"){
for(i in 1:length(files)){
download.file(files[i], destfile = paste0(outPath, "/", brand, "_", i, ".jpg"), mode = 'wb')
}

}

exchange the search terms here!

gg <- scrapeJSSite(searchTerm = "Hermes+logo")
downloadImages(as.character(gg$images), 'Hermes')

`

@markusdumke
Copy link

How can I download more than 20 images?

@ArindamRouth
Copy link

How to Download more than 20 images? Please help

@flovv
Copy link
Author

flovv commented Dec 29, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment