Created
July 21, 2018 18:12
-
-
Save mhoffman/c418acb6b3f928eb4ef71f4001d120d9 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# How to Retrieve Atoms Objects for a Dataset" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First, we pick a dataset by firing up [https://www.catalysis-hub.org/publications](https://www.catalysis-hub.org/publications) . If we found an interesting dataset, we copy the pubId (without the `#` character). In the meantime, we need a couple of imports and define a fetch function for getting some of the technicalities for fetching over HTTP out of the way." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import requests\n", | |
"import pprint\n", | |
"import sys\n", | |
"import string\n", | |
"import json\n", | |
"import io\n", | |
"import copy\n", | |
"\n", | |
"import ase.io\n", | |
"import ase.calculators.singlepoint\n", | |
"\n", | |
"GRAPHQL = 'http://api.catalysis-hub.org/graphql'\n", | |
"\n", | |
"def fetch(query):\n", | |
" return requests.get(\n", | |
" GRAPHQL, {'query': query}\n", | |
" ).json()['data']" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's assume we found the dataset `FesterEdge2017` to be interesting. We then query all reactions and geometries associated with it through the GraphQL endpoint. In the example we paginate into chunks of 10 reaction to avoid timeouts on a busy server. As a poormans progress bar we print out a short status line after each page." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"True YXJyYXljb25uZWN0aW9uOjk= 10 16\n", | |
"False YXJyYXljb25uZWN0aW9uOjE1 20 16\n" | |
] | |
} | |
], | |
"source": [ | |
"def reactions_from_dataset(pub_id, page_size=10):\n", | |
" reactions = []\n", | |
" has_next_page = True\n", | |
" start_cursor = ''\n", | |
" page = 0\n", | |
" while has_next_page:\n", | |
" data = fetch(\"\"\"{{\n", | |
" reactions(pubId: \"{pub_id}\", first: {page_size}, after: \"{start_cursor}\") {{\n", | |
" totalCount\n", | |
" pageInfo {{\n", | |
" hasNextPage\n", | |
" hasPreviousPage\n", | |
" startCursor\n", | |
" endCursor \n", | |
" }} \n", | |
" edges {{\n", | |
" node {{\n", | |
" Equation\n", | |
" reactants\n", | |
" products\n", | |
" reactionEnergy\n", | |
" reactionSystems {{\n", | |
" name\n", | |
" systems {{\n", | |
" energy\n", | |
" InputFile(format: \"json\")\n", | |
" }}\n", | |
" }} \n", | |
" }} \n", | |
" }} \n", | |
" }} \n", | |
" }}\"\"\".format(start_cursor=start_cursor,\n", | |
" page_size=page_size,\n", | |
" pub_id=pub_id,\n", | |
" ))\n", | |
" has_next_page = data['reactions']['pageInfo']['hasNextPage']\n", | |
" start_cursor = data['reactions']['pageInfo']['endCursor']\n", | |
" page += 1\n", | |
" print(has_next_page, start_cursor, page_size * page, data['reactions']['totalCount'])\n", | |
" reactions.extend(map(lambda x: x['node'], data['reactions']['edges']))\n", | |
"\n", | |
" return reactions\n", | |
"\n", | |
"raw_reactions = reactions_from_dataset(\"FesterEdge2017\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"After having retrieved all reactions we turn those pesky json strings into ASE atoms object. We can do so in-place like so. Note that for each fetch we can only do so once because we remove some fields." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def aseify_reactions(reactions):\n", | |
" for i, reaction in enumerate(reactions):\n", | |
" for j, _ in enumerate(reactions[i]['reactionSystems']):\n", | |
" with io.StringIO() as tmp_file:\n", | |
" system = reactions[i]['reactionSystems'][j].pop('systems')\n", | |
" tmp_file.write(system.pop('InputFile'))\n", | |
" tmp_file.seek(0)\n", | |
" atoms = ase.io.read(tmp_file, format='json')\n", | |
" calculator = ase.calculators.singlepoint.SinglePointCalculator(\n", | |
" atoms,\n", | |
" energy=system.pop('energy')\n", | |
" )\n", | |
" atoms.set_calculator(calculator)\n", | |
" #print(atoms.get_potential_energy())\n", | |
" reactions[i]['reactionSystems'][j]['atoms'] = atoms\n", | |
" # flatten list further into {name: atoms, ...} dictionary\n", | |
" reactions[i]['reactionSystems'] = {x['name']: x['atoms']\n", | |
" for x in reactions[i]['reactionSystems']}\n", | |
" \n", | |
"reactions = copy.deepcopy(raw_reactions)\n", | |
"aseify_reactions(reactions)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This gives us finally a list of reactions with all ASE Atoms geometries and potential energies in place." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'Equation': 'H2O(g) - H2(g) + * -> O*',\n", | |
" 'products': '{\"Ostar\": 1}',\n", | |
" 'reactants': '{\"star\": 1, \"H2gas\": -1.0, \"H2Ogas\": 1}',\n", | |
" 'reactionEnergy': 3.033416719999991,\n", | |
" 'reactionSystems': {'H2Ogas': Atoms(symbols='H2O', pbc=True, cell=[14.0, 16.526478, 16.596309], calculator=SinglePointCalculator(...)),\n", | |
" 'H2gas': Atoms(symbols='H2', pbc=True, cell=[14.0, 15.0, 16.737166], calculator=SinglePointCalculator(...)),\n", | |
" 'HO2star': Atoms(symbols='H7Au48Co8O20', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n", | |
" 'HOstar': Atoms(symbols='H7Au48Co8O19', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n", | |
" 'Ostar': Atoms(symbols='H6Au48Co8O19', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n", | |
" 'star': Atoms(symbols='H6Au48Co8O18', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...))}}" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"reactions[5]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment