Skip to content

Instantly share code, notes, and snippets.

@arademaker
Created March 1, 2011 02:19
Show Gist options
  • Save arademaker/848479 to your computer and use it in GitHub Desktop.
Save arademaker/848479 to your computer and use it in GitHub Desktop.
Verificando autores com ordem de autoria repetida nos Lattes
library(XML)
check <- function(filename, dir = getwd()) {
doc <- xmlInternalTreeParse(paste(dir, filename, sep="/"))
tmp <- xpathSApply(doc,"//AUTORES", function(x)
c(xmlGetAttr(xmlParent(x), "SEQUENCIA-PRODUCAO"),
xmlGetAttr(x, "ORDEM-DE-AUTORIA")))
if(class(tmp) == "list" & length(tmp) == 0)
return(list(file = filename, resp = NA))
tmp <- t(tmp)
tt <- table(tmp[,1], tmp[,2])
list(file = filename, resp = which(tt > 1, arr.ind = TRUE))
}
# onde "xml" e o diretorio onde estão os arquivos XML dos currículos
# Lattes recuperados do CNPq.
filenames <- dir("xml", full.names = TRUE)
# Aplicando a função check para cada um dos arquivos.
teste <- lapply(filenames, check)
# Verificando os casos com erros
erros <- lapply(teste, function(x) ifelse(class(x$resp) == "matrix",
nrow(x$resp), 0))
teste[which(unlist(erros) > 0 )]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment