Last active
August 29, 2015 14:06
-
-
Save kbroman/a35172029b7a319d74c5 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> test() | |
Loading required package: testthat | |
Testing aRxiv | |
Loading aRxiv | |
arxiv_errors : ... | |
arxiv_search in batches : .. | |
cleaning the records : ................... | |
search range of dates : .... | |
basic searches : ......... | |
sort_by and sort_order args work : ...1 | |
is_too_many : ... | |
1. Failure(@test-sort.R#42): sort by lastUpdatedDate -------------------------------------------------------------- | |
zr$updated not equal to expected | |
Lengths (2, 0) differ (string compare on first 0) | |
> test() | |
Testing aRxiv | |
Loading aRxiv | |
arxiv_errors : ... | |
arxiv_search in batches : .. | |
cleaning the records : ................... | |
search range of dates : .... | |
basic searches : ......... | |
sort_by and sort_order args work : 1234 | |
is_too_many : ... | |
1. Failure(@test-sort.R#15): sort by publishedDate ---------------------------------------------------------------- | |
z$submitted not equal to expected | |
Lengths (2, 1) differ (string compare on first 1) | |
2. Failure(@test-sort.R#20): sort by publishedDate ---------------------------------------------------------------- | |
zr$submitted not equal to expected | |
Lengths (2, 1) differ (string compare on first 1) | |
3. Failure(@test-sort.R#37): sort by lastUpdatedDate -------------------------------------------------------------- | |
z$updated not equal to expected | |
Lengths (2, 0) differ (string compare on first 0) | |
4. Failure(@test-sort.R#42): sort by lastUpdatedDate -------------------------------------------------------------- | |
zr$updated not equal to expected | |
Lengths (2, 0) differ (string compare on first 0) | |
> test() | |
Testing aRxiv | |
Loading aRxiv | |
arxiv_errors : ... | |
arxiv_search in batches : .. | |
cleaning the records : ................... | |
search range of dates : .... | |
basic searches : ......... | |
sort_by and sort_order args work : .... | |
is_too_many : ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# getting variable responses from arXiv API when requesting sorted results | |
# the following makes same request 4 times in a row | |
if(!require(httr)) install.packages(httr) | |
if(!require(devtools)) install.packages(devtools) | |
library(devtools) | |
if(!require(aRxiv)) install_github("ropensci/aRxiv") | |
library(httr) | |
library(aRxiv) | |
# problem query | |
# http://export.arxiv.org/api/query?search_query=ti:deconvolution+AND+submittedDate:[199001010000+TO+201409062400]&max_results=2 | |
repeated_search <- | |
function(query, sort_by=c("submittedDate", "lastUpdatedDate", "relevance"), | |
ascending=TRUE, n.tries=50, delay=1, limit=2, start=0, verbose=FALSE) | |
{ | |
query_url <- "http://export.arxiv.org/api/query" | |
options(aRxiv_delay=delay) | |
sort_by <- match.arg(sort_by) | |
sort_order <- ifelse(ascending, "ascending", "descending") | |
raw_result <- tab_result <- vector("list", n.tries) | |
for(s in 1:n.tries) { | |
if(verbose) message("try ", s) | |
aRxiv:::delay_if_necessary() | |
raw_result[[s]] <- POST(query_url, body=list(search_query=query, | |
max_results=limit, start=start, | |
sortBy=sort_by, sortOrder=sort_order)) | |
tab_result[[s]] <- aRxiv:::listresult2df( aRxiv:::get_entries( aRxiv:::result2list(raw_result[[s]])) ) | |
} | |
list(raw_result=raw_result, tab_result=tab_result) | |
} | |
time_query <- "ti:deconvolution AND submittedDate:[199001010000 TO 201409062400]" | |
other_query <- "ti:deconvolution" | |
results_timequery <- repeated_search(time_query, n.tries=100, verbose=TRUE) | |
results_otherquery <- repeated_search(other_query, n.tries=100, verbose=TRUE) | |
save(results_timequery, results_otherquery, file="results.tgz") | |
# same number of rows for each? | |
sapply(results_timequery$tab_result, nrow) | |
sapply(results_otherquery$tab_result, nrow) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Related to my work on the aRxiv package, for access to the arXiv API, I'm finding that I get variable results when I use
submittedDate
ranges in the query.I initially thought the problem had to do with using
sortBy
andsortOrder
, but it seems like it's the use ofsubmittedDate
in the query itself that is the issue.The
time_query
here returns a single entry 15% of the time but most of the time two entries. A separate gist shows the actual XML results for a case in which one entry was returned and another case in which two entries were returned