Wayne's Bioinformatics Code Portal fomightez

## get accession numbers regex.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                fomightez
                / get accession numbers regex.md
            
            
              Last active
              August 29, 2015 14:07
            
              
                get list of just accession numbers from fasta sequence entries list using regular expressions
              
          
    Step 1: eliminate all but description line of FASTA entries

First to reduce to just lines beginning with carets, i.e., leave only the description line   (<---from http://stackoverflow.com/questions/7310598/remove-all-lines-without-an-character-in-notepad)
FIND:
^[^>]*$

REPLACE:

  
## test_table.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                fomightez
                / test_table.md
            
            
              Last active
              August 29, 2015 14:08
            
              
                test_table
              
          
Left-Aligned
Center Aligned
Right Aligned


col 3 is
some wordy text
$1600


col 2 is
centered
$12


zebra stripes
are neat
$1


col 3 is
some wordy text
$1600


col 2 is
centered
$12


zebra stripes
are neat
$1


col 3 is
some wordy text
$1600


col 2 is
centered
$12


## ELink_keeping_order.py
print "\n\n\n\n\nSetting up .... \n"
from Bio import Entrez
Entrez.email = "A.N.Other@example.com"     # Always tell NCBI who you are
protein_gi_numbers = ["148908191", "297793721", "48525513", "507118461"]
print "protein_gi_numbers to get are" + str(protein_gi_numbers)
taxonomy_uids = []

#ELink step
print "performing ELink step....\n"
handle = Entrez.elink(dbfrom="protein", db="taxonomy", id=protein_gi_numbers)

## mRNAforProtein.py
#script to accompany https://www.biostars.org/p/64078/

from Bio import Entrez
Entrez.email = "A.N.Other@example.com"

protein_accn_numbers = ["ABR17211.1", "XP_002864745.1", "AAT45004.1", "XP_003642916.1" ]
protein_gi_numbers = []

print "The Accession numbers for protein sequence provided:"
print protein_accn_numbers

## DNA_to_RNAsimple.py
#! /usr/bin/env python

# DNA_to_RNA.py
# basic version of DNA FASTA records converted to RNA, see https://github.com/fomightez/sequencework/blob/master/ConvertSeq/ConvertFASTAdnaSEQtoRNA.py for a fancier version

# adapted from start and end of latlon_3.py - from Chapter 10 PCfB
# Read in each line of the example file


# Set the input file name

## very_high_level_language.py
import random


numbers = range (1,50)
chosen = []

while len(chosen) < 6:
	number = random.choice(numbers)
	numbers.remove(number)
	chosen.append(number)

## fasta_example
>gi|429243135|ref|NM_001019799.2| Schizosaccharomyces pombe 972h- ribonuclease MRP complex subunit (predicted) (SPAC323.08), mRNA
TGTTCACATTGCTCACTCGTTGGGTGGTTTGTACGACCTATTTGTCTAGTCCAACGATATGCAGGAATTG
CAATACGATGTAGTTTTATTGCAAAAAATCGTGTATAGGAATAGAAATCAGCATCGACTAAGTGTTTGGT
GGAGACACGTACGAATGCTGCTTCGAAGACTAAAGCAGTCGCTAGATGGAAATGAAAAAGCGAAAATTGC
TATTTTAGAACAATTGCCGAAATCGTACTTTTATTTTACAAACTTAATTGCCCATGGTCAGTATCCAGCC
TTAGGGTTAGTTTTGCTGGGTATCTTAGCTCGCGTTTGGTTTGTTATGGGCGGAATAGAGTACGAAGCAA
AAATACAATCGGAAATAGTCTTTAGTCAAAAGGAGCAAAAAAAATTGGAATTACAGTCTCAAGATGACAT
AGACACTGGGACTGTTGTAGCTCGCGATGAATTGCTAGCTACGGAACCTATTTCATTGTCTATAAATCCT
GCTTCTACTAGTTATGAGAAACTGACTGTATCCTCTCCTAATTCTTTTCTCAAGAATCAAGATGAATCTC
TCTTCTTGTCTTCTTCTCCTATAACTGTTTCTCAAGGTACCAAACGTAAATCCAAAAACTCAAATTCCAC

## mutation_data.bed
chr1 	21394	21394	A	G
chr2	94116	94116	A	G
chr3	41121	41121	T	C
chr4	22139	22139	A	G
chr5	181396	181396	G	A
chr7	347119	347119	A	G
chr8	99196	99196	A	G
chr10	194236	194236	C	G

## Using SimpleHTTPServer on Mac to run JSmol locally.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                fomightez
                / Using SimpleHTTPServer on Mac to run JSmol locally.md
            
            
              Last active
              August 29, 2015 14:15
            
              
                Using SimpleHTTPServer on Mac to run JSmol locally
              
          
    ##Using SimpleHTTPServer to run JSmol locally with Chrome

adapted from Nelson Liu's post to Jmol Users' list Tue, 17 Feb 2015. It will work out of the box on both Mac and Linux machines. Windows will need Python installed and a terminal emulator (UNTESTED!!). Doesn't matter if you are already running Chrome; I didn't have luck with open -a /Applications/Google\ Chrome.app/ --args --allow-file-access-from-files when I already had Chrome running.


Skip the first three steps if you already downloaded Jmol, unpacked it, and unpacked jsmol.zip.


Download Jmol binary.


Unpack binary.


## regex_for_SGD_fasta.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                fomightez
                / regex_for_SGD_fasta.md
            
            
              Last active
              August 29, 2015 14:17
            
              
                regular expression to replace description lines in fasta from SGD with simple 'chr' followed by number or mt
              
          
    REGEX for replacing SGD fasta description line with chromosome number
recreating steps probably used in process described in ChIP-Seq example at NUCwave site

S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with chrI-chrXVI.
Left-Aligned	Center Aligned	Right Aligned
col 3 is	some wordy text	$1600
col 2 is	centered	$12
zebra stripes	are neat	$1
col 3 is	some wordy text	$1600
col 2 is	centered	$12
zebra stripes	are neat	$1
col 3 is	some wordy text	$1600
col 2 is	centered	$12
	print "\n\n\n\n\nSetting up .... \n"
	from Bio import Entrez
	Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
	protein_gi_numbers = ["148908191", "297793721", "48525513", "507118461"]
	print "protein_gi_numbers to get are" + str(protein_gi_numbers)
	taxonomy_uids = []

	#ELink step
	print "performing ELink step....\n"
	handle = Entrez.elink(dbfrom="protein", db="taxonomy", id=protein_gi_numbers)
	#script to accompany https://www.biostars.org/p/64078/

	from Bio import Entrez
	Entrez.email = "A.N.Other@example.com"

	protein_accn_numbers = ["ABR17211.1", "XP_002864745.1", "AAT45004.1", "XP_003642916.1" ]
	protein_gi_numbers = []

	print "The Accession numbers for protein sequence provided:"
	print protein_accn_numbers
	#! /usr/bin/env python

	# DNA_to_RNA.py
	# basic version of DNA FASTA records converted to RNA, see https://github.com/fomightez/sequencework/blob/master/ConvertSeq/ConvertFASTAdnaSEQtoRNA.py for a fancier version

	# adapted from start and end of latlon_3.py - from Chapter 10 PCfB
	# Read in each line of the example file


	# Set the input file name
	import random


	numbers = range (1,50)
	chosen = []

	while len(chosen) < 6:
	number = random.choice(numbers)
	numbers.remove(number)
	chosen.append(number)
	>gi\|429243135\|ref\|NM_001019799.2\| Schizosaccharomyces pombe 972h- ribonuclease MRP complex subunit (predicted) (SPAC323.08), mRNA
	TGTTCACATTGCTCACTCGTTGGGTGGTTTGTACGACCTATTTGTCTAGTCCAACGATATGCAGGAATTG
	CAATACGATGTAGTTTTATTGCAAAAAATCGTGTATAGGAATAGAAATCAGCATCGACTAAGTGTTTGGT
	GGAGACACGTACGAATGCTGCTTCGAAGACTAAAGCAGTCGCTAGATGGAAATGAAAAAGCGAAAATTGC
	TATTTTAGAACAATTGCCGAAATCGTACTTTTATTTTACAAACTTAATTGCCCATGGTCAGTATCCAGCC
	TTAGGGTTAGTTTTGCTGGGTATCTTAGCTCGCGTTTGGTTTGTTATGGGCGGAATAGAGTACGAAGCAA
	AAATACAATCGGAAATAGTCTTTAGTCAAAAGGAGCAAAAAAAATTGGAATTACAGTCTCAAGATGACAT
	AGACACTGGGACTGTTGTAGCTCGCGATGAATTGCTAGCTACGGAACCTATTTCATTGTCTATAAATCCT
	GCTTCTACTAGTTATGAGAAACTGACTGTATCCTCTCCTAATTCTTTTCTCAAGAATCAAGATGAATCTC
	TCTTCTTGTCTTCTTCTCCTATAACTGTTTCTCAAGGTACCAAACGTAAATCCAAAAACTCAAATTCCAC
	chr1 21394 21394 A G
	chr2 94116 94116 A G
	chr3 41121 41121 T C
	chr4 22139 22139 A G
	chr5 181396 181396 G A
	chr7 347119 347119 A G
	chr8 99196 99196 A G
	chr10 194236 194236 C G