Created
August 18, 2014 01:41
-
-
Save paulgradie/f488aea3e17be7fc8d59 to your computer and use it in GitHub Desktop.
Outdated Version: For making all your reference chromosome files evenly formatted to lines of 50bp PRIOR to concatenation.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
__author__ = 'Paul G' | |
from sys import argv | |
import os | |
import re | |
script = argv | |
print """\nHi USER (lol Tron joke!) This simple script is meant to help you format your reference file.\n | |
It is really only meant to handle one chromosome file at a time,a future update will allow for entire reference fix. | |
Purpose: | |
Adjusting the line lengths of your reference chromosome PRIOR TO CONCATENATION by ignoring fasta headers, removing line | |
breaks, and then redistributing the line breaks. Many genome browsers such as IGV won't accept a reference file that | |
doesn't have lines formatted evenly with 50bp./n/n/n This script does NOT overwrite your original file!! | |
""" | |
print """Usage:\n1.First navigate to the directory where your reference file is located. \n2. When you see the '>' prompt, type the file name of your reference chromosome. \n3. Alternatively, you may write a relative path to your file from whichever directory you are in.\n4. A third option is to write an absolute path from any directory. | |
5. Simply type the name of your chromosome file and hit enter. | |
Output: A file called "FixedRef.fa" that you can rename to whatever you'd like. I recommend writing 'chr#_fixed.fa'. | |
""" | |
#unhash this; this is how we select our file | |
prompt = raw_input("Type your filename or path plus filename>") | |
#prompt = "formattest.fa" ##For testing only - unhash to test | |
#make a new file called FixedRef.fa | |
finaloutput = open("FixedRef.fa", 'w') | |
tempref = "" | |
#open the old chr.fa file and go through it line by line | |
with open(prompt, "r") as ref: | |
for line in ref: | |
#if its the fasta formatter, skip this line and write it to the output | |
if line.startswith(">") is True: | |
tempheader = line | |
#otherwise, add on the entire next line without the '\n' marker | |
else: | |
tempref += line.rstrip('\n') | |
finaloutput.write(tempheader) | |
#every 50 characters, insert a '\n' line return | |
finaloutput.write(re.sub("(.{50})", "\\1\n", tempref, 0, re.DOTALL)) | |
finaloutput.close() | |
print "All done." |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment