Skip to content

Instantly share code, notes, and snippets.

@ramdaffe
Created February 19, 2016 21:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ramdaffe/c43bf4de5c7f26007688 to your computer and use it in GitHub Desktop.
Save ramdaffe/c43bf4de5c7f26007688 to your computer and use it in GitHub Desktop.
R script for trust positif list
domain<-read.csv("domainsporn")
domain2<-read.csv("domains")
library(tldextract)
library(descr)
data("tldnames")
head(tldnames)
tld<-getTLD()
colnames(domain)<-"url"
colnames(domain2)<-"url"
d<-tldextract(domain$url,tldnames = tld)
d2<-tldextract(domain2$url,tldnames = tld)
head(d)
d_freq<-as.data.frame(freq(d$tld,plot=FALSE))
d_na<-d[complete.cases(d[,3:4]),]
#function to shift the wrong tld
shifttld<-function(y,x){
y[y$tld==x,]$subdomain<-y[y$tld==x,]$domain
y[y$tld==x,]$domain<-y[y$tld==x,]$tld
y[y$tld==x,]$tld<-strsplit(x,"[.]")[[1]][2]
return(y)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment