Last active
November 12, 2021 10:55
-
-
Save baoilleach/4953366 to your computer and use it in GitHub Desktop.
Import and Export ChEMBL activities to/from MySQL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(Use "quit;" to exit mysql prompt) | |
1. Download chembl_15_mysql.tar.gz | |
2. Get rid of the existing: "drop database chembldb" | |
mysql> drop database chembldb; | |
ERROR 1010 (HY000): Error dropping database (can't rmdir '.\chembldb', errno: 41) | |
(...the error was because I exported a file to this folder: C:\ProgramData\MySQL\MySQL Server 5.5\data\chembldb | |
I went there and deleted it and repeated the command - it worked fine) | |
3. create database chembl_15; | |
4. | |
C:\Users\noel>mysql -u root chembl_15 -p < "C:\Users\noel\Downloads\chembl_15_mysql\chembl_15_mysql\chembl_15.mysqldump.sql" | |
Enter password: ******************** | |
C:\Tools\cclib>mysql -u root -p | |
Enter password: ******************** | |
Welcome to the MySQL monitor. Commands end with ; or \g. | |
Your MySQL connection id is 11 | |
Server version: 5.5.29 MySQL Community Server (GPL) | |
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved. | |
Oracle is a registered trademark of Oracle Corporation and/or its | |
affiliates. Other names may be trademarks of their respective | |
owners. | |
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. | |
mysql> use chembldb | |
Database changed | |
mysql> select t1.activity_id, t2.chembl_id, t1.standard_relation, t1.standard_value, t1.standard_units, t1.standard_type, t3.chembl_id from activities as t1 INNER JOIN assays as t2 ON t1.assay_id = t2.assay_id INNER JOIN molecule_dictionary AS t3 ON t1.molregno = t3.molregno limit 0, 1000 INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp2.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; | |
mysql> select t1.activity_id, t2.chembl_id, t1.standard_relation, t1.standard_value, t1.standard_units, t1.standard_type, t3.chembl_id from activities as t1 INNER JOIN assays as t2 ON t1.assay_id = t2.assay_id INNER JOIN molecule_dictionary AS t3 ON t1.molregno = t3.molregno INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp2.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; | |
Notes: An inner join is like the intersection of two sets. Results will only be returned iif there is a match from t1.assay_id to t2.assay_id and a a match from t1.molregno to t3.molregno. In other words, assay entries where such a match cannot be found will be discared. If this is not the desired behaviour (e.g. you would prefer NULL values in these cases) you could use a LEFT JOIN (the whole of t1 is returned). | |
----- | |
To get the assay descriptions out, use: | |
select chembl_id, description, doc_id from assays where assay_type='B' order by chembl_id INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp6.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; | |
To add in, even more assay info, use: | |
select t1.chembl_id, t1.tid, t1.description, t2.pref_name, t2.organism, t3.journal, t3.year, t3.volume, t3.first_page, t3.doi from assays as t1 inner join target_dictionary as t2 on t2.tid=t1.tid inner join docs as t3 on t3.doc_id=t1.doc_id where t1.assay_type='B' order by t1.chembl_id INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp2.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; | |
The minimum required for the webapp is: | |
select t1.chembl_id, t1.tid, t1.description, t2.pref_name, t2.organism from assays as t1 inner join target_dictionary as t2 on t2.tid=t1.tid where t1.assay_type='B' order by t1.chembl_id INTO OUTFILE 'C:\\Users\\Noel\\tmp\\assayDescriptions.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; | |
----- | |
To get the hERG | |
select t1.activity_id, t2.chembl_id, t1.standard_relation, t1.standard_value, t1.standard_units, t1.standard_type, t3.chembl_id from activities as t1 INNER JOIN assays as t2 ON t1.assay_id = t2.assay_id INNER JOIN molecule_dictionary AS t3 ON t1.molregno = t3.molregno WHERE t2.tid=165 AND t1.standard_type="IC50" INTO OUTFILE 'C:\\Users\\Noel\\tmp\\tmp5.csv ' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment