First to reduce to just lines beginning with carets, i.e., leave only the description line (<---from http://stackoverflow.com/questions/7310598/remove-all-lines-without-an-character-in-notepad)
FIND:
^[^>]*$
REPLACE:
from bs4 import BeautifulSoup | |
file_name = "concepts.html" | |
start_of_example_urls = "http://www.codeskulptor.org/#exampl" | |
soup = BeautifulSoup(open(file_name)) | |
#print(soup.prettify()) |
First to reduce to just lines beginning with carets, i.e., leave only the description line (<---from http://stackoverflow.com/questions/7310598/remove-all-lines-without-an-character-in-notepad)
FIND:
^[^>]*$
REPLACE:
REGEX remove blank lines:
FROM: http://www.ultraedit.com/support/tutorials_power_tips/ultraedit/remove_blank_lines.html
FIND:
^(?:[\t ]*(?:\r?\n|\r))+
NOTE: These FASTA entries were first put through my namerv.1.py Python program to put scientific name at start instead of lots of codes that you get back in versions from BATCH ENTREZ.
WAIT!!!! This didn't quite work. For example, failed on all like '>Pichia kudriavzevii |gi|695112010|gb|KGK38559.1|' and '>Colletotrichum higginsianum |gi|380481846|emb|CCF41606.1|'. NEEDS PERFECTING
FIND:
(>\w)\w+ (\w+) (\w+)
Left-Aligned | Center Aligned | Right Aligned |
---|---|---|
col 3 is | some wordy text | $1600 |
col 2 is | centered | $12 |
zebra stripes | are neat | $1 |
col 3 is | some wordy text | $1600 |
col 2 is | centered | $12 |
zebra stripes | are neat | $1 |
col 3 is | some wordy text | $1600 |
col 2 is | centered | $12 |
adapted from here
I had a huge file I wanted to find replace in and it was seeming to make Sublime Text unresponsive so I tried
sed 's/transmit 0.780000/transmit 0.97/g' <test_78.txt >test_97.txt
and it worked great.
The INPUT file is test_78.txt
and OUTPUT file is test_97.txt
.
print "\n\n\n\n\nSetting up .... \n" | |
from Bio import Entrez | |
Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are | |
protein_gi_numbers = ["148908191", "297793721", "48525513", "507118461"] | |
print "protein_gi_numbers to get are" + str(protein_gi_numbers) | |
taxonomy_uids = [] | |
#ELink step | |
print "performing ELink step....\n" | |
handle = Entrez.elink(dbfrom="protein", db="taxonomy", id=protein_gi_numbers) |