Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Check 0 pageviews by comparing sitemap.XML URLs with Google Analytics visits.
library(googleAnalyticsR)
library(xml2)
library(dplyr)
ga_auth()
## date range of URLs to test
dates <- c(Sys.Date() - 30, Sys.Date())
##GA View ID
id <- 11111111
## function to get sitemap URLs
get_sitemap <- function(sitemap, field = "loc"){
sm <- as_list(read_xml(sitemap))
out <- try(Reduce(rbind,
vapply(sm, function(x) Reduce(rbind, x[[field]]), character(1))
))
if(inherits(out, "try-error")){
message("Problem with sitemap:", sitemap)
return(NULL)
}
as.vector(out)
}
## make google SEO filter
google_seo <- filter_clause_ga4(
list(
dim_filter("source", "EXACT", "google"),
dim_filter("medium", "EXACT", "organic")
),
operator = "AND")
## get the pages
pages <- google_analytics_4(id,
date_range = dates,
dimensions = "pagePath",
metrics = c("pageviews","totalEvents"),
dim_filters = google_seo,
max = -1,
anti_sample = TRUE)
## get the sitemap index file
url_si <- "http://www.example.com/sitemap.xml"
sitemap_index <- get_sitemap(url_si)
## get all the sitemaps (maybe you only need the call above if you have no sitemap index)
many_sitemaps <- lapply(sitemap_index, get_sitemap)
## all the urls in all the sitemaps
all_urls <- Reduce(c, many_sitemaps)
## Compare and get the URLs that are in XML but not in Google Analytics
## dplyr transformations
sitemap_urls <- as.tbl(as.data.frame(all_urls, stringsAsFactors = FALSE))
sitemap_urls <- sitemap_urls %>% mutate(path = paste0("/",urltools::path(all_urls)))
sitemap_not_in_ga <- anti_join(sitemap_urls, pages, by = c(path = "pagePath"))
## write out to CSV
write.csv(sitemap_not_in_ga, file = "./data/sitemap_urls_not_in_ga.csv", row.names = FALSE)
@withetu

This comment has been minimized.

Copy link

withetu commented Jan 30, 2017

Hello,

I am a beginner in R. While do run your R code for my GA account, here I stuck up

get the sitemap index file

url_si <- "http://www.my-domain.com/sitemap.xml"
sitemap_index <- get_sitemap(url_si)

error:
Error in x[[field]] : subscript out of bounds
Problem with sitemap:http://www.my-domain.com/sitemap.xml

Please help me!

Thank you

@MarkEdmondson1234

This comment has been minimized.

Copy link
Owner Author

MarkEdmondson1234 commented Jun 21, 2017

Sorry I get no notifications for this so missed it. Its saying you have no field in the sitemap, is it a correctly configured one? I realise you may never see this, for the same reasons I didn't.

@stringbenderb5

This comment has been minimized.

Copy link

stringbenderb5 commented Jul 5, 2017

Where would I place this code in a wordpress site? and will it work in wordpress?

@MarkEdmondson1234

This comment has been minimized.

Copy link
Owner Author

MarkEdmondson1234 commented Jun 13, 2018

Just saw this, sorry. It won't work in Wordpress, which is PHP. This is a script to run in R, locally on your laptop, perhaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.