These datasets have been found in the Metabolights repository (annotated with ChEBI, (class)name, SMILES*, InChI*) and Metabolomics workbench(annotated with PubChem*, Kegg*), through the search term 'toxic' (the term 'UVCB' gave 0 results). *If available?
ID | Species | Chemical(s) | Comment(s) | Technique | Annotated IDs | Unannotated peaks |
---|---|---|---|---|---|---|
MTBLS275 | mouse (Mus musculus) | chlorpyrifos; chlorpyrifos-methyl metabolite(3,5,6-trichloro-2-pyridinol) | oral exposure | NMR | 52 | 2+? |
MTBLS48 | mouse (Mus musculus) | Municipal wastewater effluents (MWWE) | transcriptomic data also available | NMR | 47 | 1+? |
MTBLS532 | Earthworms (Eisenia fetida) | Riclosan (TCS); methyl-triclosan (MTCS) | TCS is ubiquitous in sewage sludge, large proportion is transformed into MTCS | GC-MS | 17 | ? |
MTBLS602 | mouse (Mus musculus) | boscalid, captan, chlorpyrifos, thiofanate, thiacloprid, ziram | Mixture oral exposure, Untargeted urine, plasma, liver samples | NMR | 209 | 2+? |
MTBLS596 | mouse (Mus musculus) | boscalid, captan, chlorpyrifos, thiofanate, thiacloprid, ziram | Mixture oral exposure, Untargeted urine samples | UPLC-MS | 6 | 77 |
MTBLS360 | Trypanosoma brucei (African trypanosomiasis) | 3-(oxazolo[4,5-b]pyridine-2-yl)anilide (OXPA) | "non-toxic" drug, unknown mechanism of action. | LC-MS | 506 | 2+? |
MTBLS1196 | various micro-organisms | Polycyclic aromatic hydrocarbons (PAHs) | soil samples | GC-TOF-MS | 264 | 3+? |
MTBLS2878 | Euglena gracilis (microalga) | CdCl2 (heavy metal), paromomycin(antibiotics) | - | UHPLC-MS/MS | 3806 | 422 + ? |
MTBLS2166 | Glossina morsitans morsitans (African trypanosomiasis) | nitisinone | - | LC-MS/MS | 40 | ? |
MTBLS5772 | Bottlenose dolphin (Tursiops) | "Trace elements" | samples were collected from dead animals | UHPLC-MS(/MS?) | - | 316 + ? |
ST001428 | human (Homo sapiens) | "environmental toxicants" | nonalcoholic fatty liver disease NAFLD and nonalcoholic steatohepatitis (NASH) in children | LC-MS | ? | >103 (duplicates) + ? |
ST000446 | E. coli | apratoxin | anti-cancer treatment drug | UHPLC-MS | >160(duplicates) | ~20 + ? |
ST000415 | rat (Rattus norvegicus) | Flame Retardant Mixture Firemaster 550 | placenta samples | GCxGC-MS | ? | 73 + ? |
MZmine could be used to "reannotate" the raw data files from the projects above, to find more matches than currently available.
RDF of PubChem is available, on a "hidden" virtuoso endpoint. A REST RDF API Documentation prvides users with restriced access (to avoid overload of resources) [Evan]. @egonw [Egon]: how do I get a subset of PubChem RDF?
Provided by @schymane [Emma], summarized from all data on PubChem (both NORMAN data and ChEMBL). Data on Zenodo DOI:10.5281/zenodo.5644560.`
- Egon: interesting in creating GPML for this
Several datasets roughly in four categories
- Surfactants: S7 EAWAGSURF, S8 ATHENSUS, S18 TSCASURF, S23 EIUBASURF
- PFAS: S9 PFASTRIER, S25 OECDPFAS, S80 PFASGLUEGE
- Extra: CompTox datasets of interest: PFASDEV1, PFASMARKUSH, PFASMASTER
- Regulatory lists: S17 KEMIMARKET, S18 TSCASURF, S32 REACH
- Plastics: S47 ECHAPLASTICS, S48/9 CPPDB, S77 FCCDB
About 750 compounds from S18 TSCASURF (from James Little). DOI:10.5281/zenodo.2628791
Ideas:
- put in Wikidata (no salts)
- Scan EuropePMC (see Google Collab ERM/JRCNM notebook)
Possible sources:
- Rhea (recipe)
- UniProt
- Scholia
- Evan: PubChem RDF REST interface
Emma and Evan nudged Erin Baker to see if she is able to send the CCS dataset on lipids for us to discuss