Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Calculate Internal PageRank from Screaming Frog Crawl
library("igraph")
# Swap out path to your Screaming Frog All Outlink CSV. For Windows, remember to change backslashes to forward slashes.
links <- read.csv("C:/Documents/screaming-frog-all-outlinks.csv", skip = 1) # CSV Path
# This line of code is optional. It filters out JavaScript, CSS, and Images. Technically you should keep them in there.
links <- subset(links, Type=="AHREF") # Optional line. Filter.
links <- subset(links, Follow=="true")
links <- subset(links, select=c(Source,Destination))
g <- graph.data.frame(links)
pr <- page.rank(g, algo = "prpack", vids = V(g), directed = TRUE, damping = 0.85)
values <- data.frame(pr$vector)
values$names <- rownames(values)
row.names(values) <- NULL
values <- values[c(2,1)]
names(values)[1] <- "url"
names(values)[2] <- "pr"
# Swap out 'domain' and 'com' to represent your website address.
values <- values[grepl("https?:\\/\\/(.*\\.)?domain\\.com.*", values$url),] # Domain filter.
# Replace with your desired filename for the output file.
write.csv(values, file = "output-pagerank.csv") # Output file.
@SEOForce

This comment has been minimized.

Copy link

@SEOForce SEOForce commented Apr 26, 2016

Thanks for uploading this, although I could not get it to work. I made all suggested changes but nothing happens on write.

@LeenaMasih

This comment has been minimized.

Copy link

@LeenaMasih LeenaMasih commented Apr 27, 2016

Thanks for the brief explanation..

@alstucke

This comment has been minimized.

Copy link

@alstucke alstucke commented May 31, 2016

Same issues as SEOForce; I consistently receive "object __ not found" error. Is there any further documentation for this?

@andreamoro

This comment has been minimized.

Copy link

@andreamoro andreamoro commented Jul 17, 2016

I'd suggest adding the following filters too

Just make sure to get only 200 anyway, as you don't want to analyse redirects

links.href<-subset(links,Status.Code=="200")

SF mark redirects as 200 in the outbound / inbound export

links.href<-subset(links,Anchor!="Redirect")

@VictorMolla

This comment has been minimized.

Copy link

@VictorMolla VictorMolla commented Sep 22, 2016

bitmoji

@pojda

This comment has been minimized.

Copy link

@pojda pojda commented Oct 27, 2017

Another update, ScreamingFrog started marking hrefs as AHREF and not HREF. This part should also be updated:

from
links <- subset(links, Type=="HREF") # Optional line. Filter.
to
links <- subset(links, Type=="AHREF") # Optional line. Filter.

@Samses5th

This comment has been minimized.

Copy link

@Samses5th Samses5th commented Nov 15, 2017

So good

@david-forer

This comment has been minimized.

Copy link

@david-forer david-forer commented Mar 29, 2018

I was curious if anyone is using this now? Any more changes that might need to be made? I am getting only errors.

Thanks

David

@gholm

This comment has been minimized.

Copy link

@gholm gholm commented May 16, 2019

It used to work before but now I keep getting a blank .csv file. The only thing that's changed is that I don't have admin rights on my PC anymore -- could this be the root of the problem?

@billrowland

This comment has been minimized.

Copy link

@billrowland billrowland commented Jan 17, 2020

I just installed R and walked through this process.

After some experimentation, I was able to get the script to run, but it only generated a blank CSV.

Does anyone have an other ideas?

Thanks,.

@oscaramartin

This comment has been minimized.

Copy link

@oscaramartin oscaramartin commented Mar 28, 2020

Hi @pshapiro,

could you please uptdate your useful code?
With new Screaming Frog Version 12.6 , there are some errors when you try to read all_outlinks.csv

library("igraph")
links <- read.csv("all_outlinks.csv", skip = 1) # CSV Path
links <- subset(links, Type=="AHREF") # Optional line. Filter.
Error in eval(e, x, parent.frame()) : object 'Type' not found
links <- subset(links, Follow=="true")
Error in eval(e, x, parent.frame()) : object 'Follow' not found
links <- subset(links, select=c(Source,Destination))
Error in eval(substitute(select), nl, parent.frame()) : 
  object 'Source' not found
g <- graph.data.frame(links)
Error in graph.data.frame(links) : 
  the data frame should contain at least two columns
pr <- page.rank(g, algo = "prpack", vids = V(g), directed = TRUE, damping = 0.85)
values <- data.frame(pr$vector)
values$names <- rownames(values)
row.names(values) <- NULL
values <- values[c(2,1)]
names(values)[1] <- "url"
names(values)[2] <- "pr"
values <- values[grepl("https?:\\/\\/(.*\\.)?domain\\.com.*", values$url),] # Domain filter.
write.csv(values, file = "output-pagerank.csv") # Output file.

Thankx a lot

@oscaramartin

This comment has been minimized.

Copy link

@oscaramartin oscaramartin commented Mar 30, 2020

Resolved,
in all_outlinks.csv file skip = 0 .
First line "All Outlinks" has been deleted.

;-)

@Mads-Lemvigh

This comment has been minimized.

Copy link

@Mads-Lemvigh Mads-Lemvigh commented Sep 2, 2020

Any updates to SF v. 13?

@GeorgeColt

This comment has been minimized.

Copy link

@GeorgeColt GeorgeColt commented Nov 15, 2020

any updates for this script?

@webcontigo

This comment has been minimized.

Copy link

@webcontigo webcontigo commented Nov 20, 2020

Hi guys, I'm getting these errors:

> links <- subset(links, Type=="HREF") # Optional line. Filter.
Error in eval(e, x, parent.frame()) : object 'Type' not found
> links <- subset(links, Follow=="true")
Error in eval(e, x, parent.frame()) : object 'Follow' not found
> links <- subset(links, select=c(Source,Destination))
Error in eval(substitute(select), nl, parent.frame()) : 
  object 'Source' not found

Any help?

@CosmusFFW

This comment has been minimized.

Copy link

@CosmusFFW CosmusFFW commented Jun 10, 2021

There was some slight changes in the csv-file that screamingfrog outputs. This should work and you can change the Type=="Hyperlink" to look at different types of links.

# Swap out path to your Screaming Frog All Outlink CSV. For Windows, remember to change backslashes to forward slashes.
links <- read.csv("/YOUR/FILEPATH/all_outlinks.csv") # CSV Path
# This line of code is optional. It filters out JavaScript, CSS, and Images. Technically you should keep them in there.
links <- subset(links, Type=="Hyperlink") # Optional line. Filter.
links <- subset(links, Follow=="true")
links <- subset(links, select=c(Source,Destination))
g <- graph.data.frame(links)
pr <- page.rank(g, algo = "prpack", vids = V(g), directed = TRUE, damping = 0.85)
values <- data.frame(pr$vector)
values$names <- rownames(values)
row.names(values) <- NULL
values <- values[c(2,1)]
names(values)[1] <- "url"
names(values)[2] <- "pr"
# Swap out 'domain' and 'com' to represent your website address.
values <- values[grepl("https?:\\/\\/(.*\\.)?domain\\.com.*", values$url),] # Domain filter.
# Replace with your desired filename for the output file.
write.csv(values, file = "output-pagerank.csv") # Output file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment