Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to Retrieve Atoms Objects for a Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we pick a dataset by firing up [https://www.catalysis-hub.org/publications](https://www.catalysis-hub.org/publications) . If we found an interesting dataset, we copy the pubId (without the `#` character). In the meantime, we need a couple of imports and define a fetch function for getting some of the technicalities for fetching over HTTP out of the way."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import pprint\n",
"import sys\n",
"import string\n",
"import json\n",
"import io\n",
"import copy\n",
"\n",
"import ase.io\n",
"import ase.calculators.singlepoint\n",
"\n",
"GRAPHQL = 'http://api.catalysis-hub.org/graphql'\n",
"\n",
"def fetch(query):\n",
" return requests.get(\n",
" GRAPHQL, {'query': query}\n",
" ).json()['data']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's assume we found the dataset `FesterEdge2017` to be interesting. We then query all reactions and geometries associated with it through the GraphQL endpoint. In the example we paginate into chunks of 10 reaction to avoid timeouts on a busy server. As a poormans progress bar we print out a short status line after each page."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True YXJyYXljb25uZWN0aW9uOjk= 10 16\n",
"False YXJyYXljb25uZWN0aW9uOjE1 20 16\n"
]
}
],
"source": [
"def reactions_from_dataset(pub_id, page_size=10):\n",
" reactions = []\n",
" has_next_page = True\n",
" start_cursor = ''\n",
" page = 0\n",
" while has_next_page:\n",
" data = fetch(\"\"\"{{\n",
" reactions(pubId: \"{pub_id}\", first: {page_size}, after: \"{start_cursor}\") {{\n",
" totalCount\n",
" pageInfo {{\n",
" hasNextPage\n",
" hasPreviousPage\n",
" startCursor\n",
" endCursor \n",
" }} \n",
" edges {{\n",
" node {{\n",
" Equation\n",
" reactants\n",
" products\n",
" reactionEnergy\n",
" reactionSystems {{\n",
" name\n",
" systems {{\n",
" energy\n",
" InputFile(format: \"json\")\n",
" }}\n",
" }} \n",
" }} \n",
" }} \n",
" }} \n",
" }}\"\"\".format(start_cursor=start_cursor,\n",
" page_size=page_size,\n",
" pub_id=pub_id,\n",
" ))\n",
" has_next_page = data['reactions']['pageInfo']['hasNextPage']\n",
" start_cursor = data['reactions']['pageInfo']['endCursor']\n",
" page += 1\n",
" print(has_next_page, start_cursor, page_size * page, data['reactions']['totalCount'])\n",
" reactions.extend(map(lambda x: x['node'], data['reactions']['edges']))\n",
"\n",
" return reactions\n",
"\n",
"raw_reactions = reactions_from_dataset(\"FesterEdge2017\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After having retrieved all reactions we turn those pesky json strings into ASE atoms object. We can do so in-place like so. Note that for each fetch we can only do so once because we remove some fields."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def aseify_reactions(reactions):\n",
" for i, reaction in enumerate(reactions):\n",
" for j, _ in enumerate(reactions[i]['reactionSystems']):\n",
" with io.StringIO() as tmp_file:\n",
" system = reactions[i]['reactionSystems'][j].pop('systems')\n",
" tmp_file.write(system.pop('InputFile'))\n",
" tmp_file.seek(0)\n",
" atoms = ase.io.read(tmp_file, format='json')\n",
" calculator = ase.calculators.singlepoint.SinglePointCalculator(\n",
" atoms,\n",
" energy=system.pop('energy')\n",
" )\n",
" atoms.set_calculator(calculator)\n",
" #print(atoms.get_potential_energy())\n",
" reactions[i]['reactionSystems'][j]['atoms'] = atoms\n",
" # flatten list further into {name: atoms, ...} dictionary\n",
" reactions[i]['reactionSystems'] = {x['name']: x['atoms']\n",
" for x in reactions[i]['reactionSystems']}\n",
" \n",
"reactions = copy.deepcopy(raw_reactions)\n",
"aseify_reactions(reactions)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This gives us finally a list of reactions with all ASE Atoms geometries and potential energies in place."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Equation': 'H2O(g) - H2(g) + * -> O*',\n",
" 'products': '{\"Ostar\": 1}',\n",
" 'reactants': '{\"star\": 1, \"H2gas\": -1.0, \"H2Ogas\": 1}',\n",
" 'reactionEnergy': 3.033416719999991,\n",
" 'reactionSystems': {'H2Ogas': Atoms(symbols='H2O', pbc=True, cell=[14.0, 16.526478, 16.596309], calculator=SinglePointCalculator(...)),\n",
" 'H2gas': Atoms(symbols='H2', pbc=True, cell=[14.0, 15.0, 16.737166], calculator=SinglePointCalculator(...)),\n",
" 'HO2star': Atoms(symbols='H7Au48Co8O20', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
" 'HOstar': Atoms(symbols='H7Au48Co8O19', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
" 'Ostar': Atoms(symbols='H6Au48Co8O19', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
" 'star': Atoms(symbols='H6Au48Co8O18', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...))}}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reactions[5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment