Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Calculate Internal PageRank from Screaming Frog Crawl
library("igraph")
# Swap out path to your Screaming Frog All Outlink CSV. For Windows, remember to change backslashes to forward slashes.
links <- read.csv("C:/Documents/screaming-frog-all-outlinks.csv", skip = 1) # CSV Path
# This line of code is optional. It filters out JavaScript, CSS, and Images. Technically you should keep them in there.
links <- subset(links, Type=="AHREF") # Optional line. Filter.
links <- subset(links, Follow=="true")
links <- subset(links, select=c(Source,Destination))
g <- graph.data.frame(links)
pr <- page.rank(g, algo = "prpack", vids = V(g), directed = TRUE, damping = 0.85)
values <- data.frame(pr$vector)
values$names <- rownames(values)
row.names(values) <- NULL
values <- values[c(2,1)]
names(values)[1] <- "url"
names(values)[2] <- "pr"
# Swap out 'domain' and 'com' to represent your website address.
values <- values[grepl("https?:\\/\\/(.*\\.)?domain\\.com.*", values$url),] # Domain filter.
# Replace with your desired filename for the output file.
write.csv(values, file = "output-pagerank.csv") # Output file.
@SEOForce
Copy link

SEOForce commented Apr 26, 2016

Thanks for uploading this, although I could not get it to work. I made all suggested changes but nothing happens on write.

@LeenaMasih
Copy link

LeenaMasih commented Apr 27, 2016

Thanks for the brief explanation..

@alstucke
Copy link

alstucke commented May 31, 2016

Same issues as SEOForce; I consistently receive "object __ not found" error. Is there any further documentation for this?

@andreamoro
Copy link

andreamoro commented Jul 17, 2016

I'd suggest adding the following filters too

Just make sure to get only 200 anyway, as you don't want to analyse redirects

links.href<-subset(links,Status.Code=="200")

SF mark redirects as 200 in the outbound / inbound export

links.href<-subset(links,Anchor!="Redirect")

@VictorMolla
Copy link

VictorMolla commented Sep 22, 2016

bitmoji

@pojda
Copy link

pojda commented Oct 27, 2017

Another update, ScreamingFrog started marking hrefs as AHREF and not HREF. This part should also be updated:

from
links <- subset(links, Type=="HREF") # Optional line. Filter.
to
links <- subset(links, Type=="AHREF") # Optional line. Filter.

@Samses5th
Copy link

Samses5th commented Nov 15, 2017

So good

@david-forer
Copy link

david-forer commented Mar 29, 2018

I was curious if anyone is using this now? Any more changes that might need to be made? I am getting only errors.

Thanks

David

@gholm
Copy link

gholm commented May 16, 2019

It used to work before but now I keep getting a blank .csv file. The only thing that's changed is that I don't have admin rights on my PC anymore -- could this be the root of the problem?

@billrowland
Copy link

billrowland commented Jan 17, 2020

I just installed R and walked through this process.

After some experimentation, I was able to get the script to run, but it only generated a blank CSV.

Does anyone have an other ideas?

Thanks,.

@oscaramartin
Copy link

oscaramartin commented Mar 28, 2020

Hi @pshapiro,

could you please uptdate your useful code?
With new Screaming Frog Version 12.6 , there are some errors when you try to read all_outlinks.csv

library("igraph")
links <- read.csv("all_outlinks.csv", skip = 1) # CSV Path
links <- subset(links, Type=="AHREF") # Optional line. Filter.
Error in eval(e, x, parent.frame()) : object 'Type' not found
links <- subset(links, Follow=="true")
Error in eval(e, x, parent.frame()) : object 'Follow' not found
links <- subset(links, select=c(Source,Destination))
Error in eval(substitute(select), nl, parent.frame()) : 
  object 'Source' not found
g <- graph.data.frame(links)
Error in graph.data.frame(links) : 
  the data frame should contain at least two columns
pr <- page.rank(g, algo = "prpack", vids = V(g), directed = TRUE, damping = 0.85)
values <- data.frame(pr$vector)
values$names <- rownames(values)
row.names(values) <- NULL
values <- values[c(2,1)]
names(values)[1] <- "url"
names(values)[2] <- "pr"
values <- values[grepl("https?:\\/\\/(.*\\.)?domain\\.com.*", values$url),] # Domain filter.
write.csv(values, file = "output-pagerank.csv") # Output file.

Thankx a lot

@oscaramartin
Copy link

oscaramartin commented Mar 30, 2020

Resolved,
in all_outlinks.csv file skip = 0 .
First line "All Outlinks" has been deleted.

;-)

@Mads-Lemvigh
Copy link

Mads-Lemvigh commented Sep 2, 2020

Any updates to SF v. 13?

@GeorgeColt
Copy link

GeorgeColt commented Nov 15, 2020

any updates for this script?

@webcontigo
Copy link

webcontigo commented Nov 20, 2020

Hi guys, I'm getting these errors:

> links <- subset(links, Type=="HREF") # Optional line. Filter.
Error in eval(e, x, parent.frame()) : object 'Type' not found
> links <- subset(links, Follow=="true")
Error in eval(e, x, parent.frame()) : object 'Follow' not found
> links <- subset(links, select=c(Source,Destination))
Error in eval(substitute(select), nl, parent.frame()) : 
  object 'Source' not found

Any help?

@CosmusFFW
Copy link

CosmusFFW commented Jun 10, 2021

There was some slight changes in the csv-file that screamingfrog outputs. This should work and you can change the Type=="Hyperlink" to look at different types of links.

# Swap out path to your Screaming Frog All Outlink CSV. For Windows, remember to change backslashes to forward slashes.
links <- read.csv("/YOUR/FILEPATH/all_outlinks.csv") # CSV Path
# This line of code is optional. It filters out JavaScript, CSS, and Images. Technically you should keep them in there.
links <- subset(links, Type=="Hyperlink") # Optional line. Filter.
links <- subset(links, Follow=="true")
links <- subset(links, select=c(Source,Destination))
g <- graph.data.frame(links)
pr <- page.rank(g, algo = "prpack", vids = V(g), directed = TRUE, damping = 0.85)
values <- data.frame(pr$vector)
values$names <- rownames(values)
row.names(values) <- NULL
values <- values[c(2,1)]
names(values)[1] <- "url"
names(values)[2] <- "pr"
# Swap out 'domain' and 'com' to represent your website address.
values <- values[grepl("https?:\\/\\/(.*\\.)?domain\\.com.*", values$url),] # Domain filter.
# Replace with your desired filename for the output file.
write.csv(values, file = "output-pagerank.csv") # Output file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment