Skip to content

Instantly share code, notes, and snippets.

@hrbrmstr
Created January 7, 2018 14:43
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hrbrmstr/4cabe4af87bd2c5fe664b0b44a574366 to your computer and use it in GitHub Desktop.
Save hrbrmstr/4cabe4af87bd2c5fe664b0b44a574366 to your computer and use it in GitHub Desktop.
non-selenium alternative to http://www.masalmon.eu/2018/01/07/rainbowing/
library(V8)
library(xml2)
library(httr)
library(rvest)
library(stringi)
library(tidyverse)
get_page <- function(num=1, seed=Sys.Date()) {
GET(
url = "https://www.pexels.com/search/nature/",
query = list(
page=num,
format="js",
seed=seed
)
) -> res
stop_for_status(res)
x <- content(res)
x <- stri_replace_first_regex(x, "^.*beforeend','\\\\n\\\\n", "'")
x <- stri_replace_last_regex(x, "\\\\n\\\\n'\\);rowG.*$", "'")
ctx <- v8()
pg <- read_html(ctx$eval(x))
data_frame(
preview_href = html_attr(html_nodes(pg, "img"), "src"),
full_href = sprintf("https://www.pexels.com%s", html_attr(html_nodes(pg, "a"), "href")),
title = html_attr(html_nodes(pg, "a"), "title")
)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment