Skip to content

Instantly share code, notes, and snippets.

@cmdcolin
Created March 29, 2016 14:09
Show Gist options
  • Save cmdcolin/b9b3b26bcf60c56d52e8 to your computer and use it in GitHub Desktop.
Save cmdcolin/b9b3b26bcf60c56d52e8 to your computer and use it in GitHub Desktop.
calculcate GC content with sliding window in R
library("seqinr")
refseq=read.fasta('../Code/Amel_variants/Amel_4.5_scaffolds.fa')
wiggle_output <- file("output5.wig", "a")
printf("track type=print wiggle_0 name=fileName description=fileName\n",file=wiggle_output)
window_size=2000
refseq_names=names(refseq)
for(i in 1:length(refseq)) {
fasta=refseq[[i]]
fasta_name=refseq_names[[i]]
fasta_length=length(fasta)-1
mod_window_size=ifelse(fasta_length<window_size,fasta_length,window_size)
print(mod_window_size)
starts <- seq(1, length(fasta)-mod_window_size, by = window_size)
n <- length(starts) # Find the length of the vector "starts"
chunkGCs <- numeric(n) # Make a vector of the same length as vector "starts", but just containing zeroes
printf("variableStep chrom=%s span=%d\n",fasta_name,mod_window_size,file=wiggle_output)
for (i in 1:n) {
chunk <- fasta[starts[i]:(starts[i]+mod_window_size-1)]
chunkGC <- GC(chunk)
chunkGCs[i] <- chunkGC
printf("%s %s\n",starts[i],ifelse(is.na(chunkGC),0,chunkGC),file=wiggle_output)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment