Skip to content

Instantly share code, notes, and snippets.

View dgrapov's full-sized avatar

Dmitry Grapov dgrapov

View GitHub Profile
19 GLDQAMYCGOIJDV-UHFFFAOYSA-N
36 RNQHMTFBUSSBJQ-UHFFFAOYSA-N
45 ROBFUDYVXSDBQM-UHFFFAOYSA-N
49 QHKABHOOEWYVLI-UHFFFAOYSA-N
51 KPGXRSRHYNQIFN-UHFFFAOYSA-N
58 TYEYBOSBBBHJIV-UHFFFAOYSA-N
70 BKAJNAXTPSGJCU-UHFFFAOYSA-N
71 FGSBNBBHOZHUBO-UHFFFAOYSA-N
72 YQUVCSBJEUQKSH-UHFFFAOYSA-N
76 FMGSKLZLMKYGDP-UHFFFAOYSA-N
@dgrapov
dgrapov / inchi code
Last active December 13, 2015 20:18
InChI=1S/C7H6O4/c8-5-3-1-2-4(6(5)9)7(10)11/h1-3,8-9H,(H,10,11)
InChI=1S/C7H12O5/c1-3(2)4(6(9)10)5(8)7(11)12/h3-5,8H,1-2H3,(H,9,10)(H,11,12)
InChI=1S/C3H4O5/c4-1(2(5)6)3(7)8/h1,4H,(H,5,6)(H,7,8)
InChI=1S/C5H8O3/c1-3(2)4(6)5(7)8/h3H,1-2H3,(H,7,8)
InChI=1S/C5H6O5/c6-3(5(9)10)1-2-4(7)8/h1-2H2,(H,7,8)(H,9,10)
InChI=1S/C4H6O3/c1-2-3(5)4(6)7/h2H2,1H3,(H,6,7)
InChI=1S/C6H10O3/c1-4(2)3-5(7)6(8)9/h4H,3H2,1-2H3,(H,8,9)
InChI=1S/C6H8O5/c7-4(6(10)11)2-1-3-5(8)9/h1-3H2,(H,8,9)(H,10,11)
InChI=1S/C7H6O4/c8-5-2-1-4(7(10)11)3-6(5)9/h1-3,8-9H,(H,10,11)
InChI=1S/C19H28O2/c1-18-9-7-13(20)11-12(18)3-4-14-15-5-6-17(21)19(15,2)10-8-16(14)18/h3,13-16,20H,4-11H2,1-2H3
@dgrapov
dgrapov / metabolomic network.R
Last active December 15, 2015 18:49
Tutorial code for making edge lists for biochemical and chemical similarity networks
#load needed functions: R package in progress - "devium", which is stored on github
source("http://pastebin.com/raw.php?i=Y0YYEBia")
# get sample chemical identifiers here:https://docs.google.com/spreadsheet/ccc?key=0Ap1AEMfo-fh9dFZSSm5WSHlqMC1QdkNMWFZCeWdVbEE#gid=1
#Pubchem CIDs = cids
cids # overview
nrow(cids) # how many
str(cids) # structure, wan't numeric
cids<-as.numeric(as.character(unlist(cids))) # hack to break factor
@dgrapov
dgrapov / max intensity mass to charge pair.r
Created April 4, 2013 21:52
Data processing script to get maximum m/z ion intensity pair from Agilent recursive analysis
#directory (file path) where the file is
#note make sure all \ are changed to \\ or /
dir<-"ZZZ"
setwd(dir) # change directory
#name of file, should be .csv
file.name<-"XXX.csv"
#load data into R this assumes there is a header
input<-read.csv(file.name, header = TRUE)
C00005 C00006
C00003 C00004
C00002 C00008
C00019 C00021
C00007 C00027
C00010 C00024
C00002 C00020
C00002 C00013
C00002 C00009
C00015 C00029
# R interface to Chemical Identifier Resolver (CIR) by the CADD Group at the NCI/NIH
CIRgetR<-function(id,to=c("pubchem_sid"),return.all=TRUE,progress=TRUE){
#id needs to be one of the folowing types of structural ids "inchi","inchiKey" or "smiles"
#to can be: smiles, names, iupac_name, cas, inchi,
# stdinchi, inchikey, stdinchikey,
# ficts, ficus, uuuuu, image, # here return url do not evaluate
# mw, monoisotopic_mass,file,
# chemspider_id
# pubchem_sid, chemnavigator_sid, formula, chemnavigator_sid
This file has been truncated, but you can view the full file.
SMID Formula Metabolite Central Map Pathway KEGGpath KEGGid PubChem CID Lmid HMDBID MetacycID CHEBI METLIN InchiKEY Synonyms
11 H2O H2O Central Energy Metabolism Oxidative phosphorylation__Photosynthesis__Carbon fixation__Riboflavin metabolism map00190 C00001 962 HMDB02111 WATER 5585 3194 XLYOFNOQVPJJNP-UHFFFAOYSA-N Water
25361 C10H16N5O13P3 ATP Central Energy Metabolism Oxidative phosphorylation__Photosynthesis__Purine metabolism__Puromycin biosynthesis__Zeatin biosynthesis map00190 C00002 5957 HMDB00538 ATP 2359 95 ZKHQWZAMYRWXGA-KQYNXXCUSA-N Adenosine 5'-triphosphate
31698 C21H27N7O14P2 NAD+ Central Energy Metabolism Oxidative phosphorylation__Glutamate metabolism__Nicotinate and nicotinamide metabolism map00190 C00003 5893 HMDB00902 NAD 7422 101 BAWFJGJZGIEFAR-NNYOXOHSSA-O NAD_Nicotinamide adenine dinucleotide_DPN_Diphosphopyridine nucleotide_Nadide
KTVPXOYAKDPRHY-SOOFDHNKSA-N AAAFZMYJJHWUPN-TXICZTDVSA-N main RP01256
PQGCEDQWHSBAJP-TXICZTDVSA-N AAAFZMYJJHWUPN-TXICZTDVSA-N main RP10961
YXJDFQJKERBOBM-TXICZTDVSA-N AAAFZMYJJHWUPN-TXICZTDVSA-N trans RP11184
ZKHQWZAMYRWXGA-KQYNXXCUSA-N AAAFZMYJJHWUPN-TXICZTDVSA-N trans RP06423
RWKJTIHNYSIIHW-MEBVTJQTSA-N AAJODOMQIUQTFG-XAICKWAHSA-N main RP12160
RTPWRCREAVUAOI-CGJPUGKVSA-N AAKIIQMDPGWYFD-FFZYTXIOSA-N main RP11883
XLYOFNOQVPJJNP-UHFFFAOYSA-N AAKIIQMDPGWYFD-FFZYTXIOSA-N leave RP09015
XLYOFNOQVPJJNP-UHFFFAOYSA-N AAMCJNYIOAHDFN-BCCCWXECSA-N leave RP09666
AUFGTPPARQZWDO-YUZLPWPTSA-N ABCOOORLYAOBOZ-KQYNXXCUSA-N trans RP08184
BDAGIHXWWSANSR-UHFFFAOYSA-N ABCOOORLYAOBOZ-KQYNXXCUSA-N main RP10097
@dgrapov
dgrapov / query PubMed
Last active December 19, 2015 00:48
Check for key word in PubMed article titles
#Check pubmed article titles for a given year for a keyword (using partial matching).
library(XML)
library(stringr)
#get PubMed Ids for all journals for a given year
getPubMedIds<-function(year=2013, max=100){
#max = maximum results to return
url<-paste0("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=",year,"[PDAT]&RetMax=",max)
@dgrapov
dgrapov / server.R
Last active December 20, 2015 14:29 — forked from jknowles/server.R
# Script to demonstrate distributions
library(VGAM)
library(eeptools)
library(shiny)
library(ggplot2)
shinyServer(function(input,output){