Skip to content

Instantly share code, notes, and snippets.

@baoilleach
Last active November 12, 2021 10:55
Show Gist options
  • Save baoilleach/4953366 to your computer and use it in GitHub Desktop.
Save baoilleach/4953366 to your computer and use it in GitHub Desktop.
Import and Export ChEMBL activities to/from MySQL
(Use "quit;" to exit mysql prompt)
1. Download chembl_15_mysql.tar.gz
2. Get rid of the existing: "drop database chembldb"
mysql> drop database chembldb;
ERROR 1010 (HY000): Error dropping database (can't rmdir '.\chembldb', errno: 41)
(...the error was because I exported a file to this folder: C:\ProgramData\MySQL\MySQL Server 5.5\data\chembldb
I went there and deleted it and repeated the command - it worked fine)
3. create database chembl_15;
4.
C:\Users\noel>mysql -u root chembl_15 -p < "C:\Users\noel\Downloads\chembl_15_mysql\chembl_15_mysql\chembl_15.mysqldump.sql"
Enter password: ********************
C:\Tools\cclib>mysql -u root -p
Enter password: ********************
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.5.29 MySQL Community Server (GPL)
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use chembldb
Database changed
mysql> select t1.activity_id, t2.chembl_id, t1.standard_relation, t1.standard_value, t1.standard_units, t1.standard_type, t3.chembl_id from activities as t1 INNER JOIN assays as t2 ON t1.assay_id = t2.assay_id INNER JOIN molecule_dictionary AS t3 ON t1.molregno = t3.molregno limit 0, 1000 INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp2.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
mysql> select t1.activity_id, t2.chembl_id, t1.standard_relation, t1.standard_value, t1.standard_units, t1.standard_type, t3.chembl_id from activities as t1 INNER JOIN assays as t2 ON t1.assay_id = t2.assay_id INNER JOIN molecule_dictionary AS t3 ON t1.molregno = t3.molregno INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp2.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Notes: An inner join is like the intersection of two sets. Results will only be returned iif there is a match from t1.assay_id to t2.assay_id and a a match from t1.molregno to t3.molregno. In other words, assay entries where such a match cannot be found will be discared. If this is not the desired behaviour (e.g. you would prefer NULL values in these cases) you could use a LEFT JOIN (the whole of t1 is returned).
-----
To get the assay descriptions out, use:
select chembl_id, description, doc_id from assays where assay_type='B' order by chembl_id INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp6.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
To add in, even more assay info, use:
select t1.chembl_id, t1.tid, t1.description, t2.pref_name, t2.organism, t3.journal, t3.year, t3.volume, t3.first_page, t3.doi from assays as t1 inner join target_dictionary as t2 on t2.tid=t1.tid inner join docs as t3 on t3.doc_id=t1.doc_id where t1.assay_type='B' order by t1.chembl_id INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp2.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
The minimum required for the webapp is:
select t1.chembl_id, t1.tid, t1.description, t2.pref_name, t2.organism from assays as t1 inner join target_dictionary as t2 on t2.tid=t1.tid where t1.assay_type='B' order by t1.chembl_id INTO OUTFILE 'C:\\Users\\Noel\\tmp\\assayDescriptions.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
-----
To get the hERG
select t1.activity_id, t2.chembl_id, t1.standard_relation, t1.standard_value, t1.standard_units, t1.standard_type, t3.chembl_id from activities as t1 INNER JOIN assays as t2 ON t1.assay_id = t2.assay_id INNER JOIN molecule_dictionary AS t3 ON t1.molregno = t3.molregno WHERE t2.tid=165 AND t1.standard_type="IC50" INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp5.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment