Skip to content

Instantly share code, notes, and snippets.

View fjossinet's full-sized avatar
✏️
Drawing an RNA

Fabrice Jossinet fjossinet

✏️
Drawing an RNA
View GitHub Profile
@fjossinet
fjossinet / gist:2935517
Created June 15, 2012 09:02
How to download genomic scaffolds?
#!/bin/bash
echo "Start..."
rm -f ~/results.txt
mkdir -p ~/Data/nematostella_vectensis
cd ~/Data/nematostella_vectensis
@fjossinet
fjossinet / gist:2936274
Created June 15, 2012 12:40
Performing a remote Blast
wget "ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.26+-x64-linux.tar.gz"
tar -xzvf ncbi-blast-2.2.26+-x64-linux.tar.gz
ncbi-blast-2.2.26+/bin/blastn -remote -query your_fasta_file.fasta -out blast_results.txt -db nr
grep "^>" blast_results.txt | wc -l
@fjossinet
fjossinet / ids.txt
Created June 15, 2012 14:37
How to download protein or nucleotide sequences from a list of gene ids?
cat gene_ids.txt | xargs -I % wget -qO %.fasta "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=%&rettype=fasta"
@fjossinet
fjossinet / gist:2941217
Created June 16, 2012 12:28
Extract all RefSeq ids from NCBI genomes list for E coli
wget -qO - "http://www.ncbi.nlm.nih.gov/genome/genomes/167?&subset=complete&limit=refseq" | grep 'title="chromosome">Chr' | sed -E 's/.+(NC_.+|NZ_.+)/\1/' | cut -d \< -f 1
@fjossinet
fjossinet / gist:2941262
Created June 16, 2012 12:45
Extract all accession ids from the RFAM webpage
wget -qO - "http://rfam.sanger.ac.uk/family/browse" | grep ">RF" | tr -d ' ' | cut -d \> -f 2 | cut -d \< -f 1
@fjossinet
fjossinet / gist:2941281
Created June 16, 2012 13:00
Donwload all data from chromosome I of Arabidopsis thaliana through NCBI FTP
wget -r ftp://anonymous:anonymous@ftp.ncbi.nih.gov/genomes/Arabidopsis_thaliana/CHR_I/
@fjossinet
fjossinet / gist:2942223
Created June 16, 2012 18:42
Select CDS by keyword in all E. coli genomes
#!/bin/bash
query=$1
genome_ids=$(wget -qO - "http://www.ncbi.nlm.nih.gov/genome/genomes/167?&subset=complete&limit=refseq" | grep 'title="chromosome">Chr' | sed -E 's/.+(NC_.+|NZ_.+)/\1/' | cut -d \< -f 1)
for genome_id in $genome_ids
do
wget -qO - "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=$genome_id&rettype=gb&retmode=xml" > genome.xml
gene_ids=$(xmllint --xpath "//GBFeature[GBFeature_key[.='CDS'] and GBFeature_quals/GBQualifier[GBQualifier_name[.='product'] and GBQualifier_value[contains(.,\"$query\")]]]" genome.xml | grep "GI:" | sed -E 's/.+GI:(.+)<.+/\1/')
@fjossinet
fjossinet / taxid_2_gbids.py
Last active December 17, 2015 21:19
This python script recovers the genbank ids for all the nucleotide entries linked to a taxon id. The number of requests is minimized using the retmax and retstart parameters provided by the Entrez Utilities.
#!/usr/bin/env python
import xml.etree.ElementTree as ET
import sys, urllib, urllib2
eutils_base_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
def get_ids(taxid):
accession_numbers =[]
retstart = 0
@fjossinet
fjossinet / gist:9033572
Last active August 29, 2015 13:56
Create and manipulate molecules with PyRNA
{
"metadata": {
"name": "Create and manipulate molecules."
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
@fjossinet
fjossinet / gist:9035788
Last active August 29, 2015 13:56
Create and manipulate secondary structures with PyRNA
{
"metadata": {
"name": "Create and manipulate secondary structures."
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{