mhoffman/GraphQL2ASE.ipynb

## GraphQL2ASE.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to Retrieve Atoms Objects for a  Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we pick a dataset by firing up [https://www.catalysis-hub.org/publications](https://www.catalysis-hub.org/publications) . If we found an interesting dataset, we copy the pubId (without the `#` character). In the meantime, we need a couple of imports and define a fetch function for getting some of the technicalities for fetching over HTTP out of the way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import pprint\n",
    "import sys\n",
    "import string\n",
    "import json\n",
    "import io\n",
    "import copy\n",
    "\n",
    "import ase.io\n",
    "import ase.calculators.singlepoint\n",
    "\n",
    "GRAPHQL = 'http://api.catalysis-hub.org/graphql'\n",
    "\n",
    "def fetch(query):\n",
    "    return requests.get(\n",
    "        GRAPHQL, {'query': query}\n",
    "    ).json()['data']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's assume we found the dataset `FesterEdge2017` to be interesting. We then query all reactions and geometries associated with it through the GraphQL endpoint. In the example we paginate into chunks of 10 reaction to avoid timeouts on a busy server. As a poormans progress bar we print out a short status line after each page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True YXJyYXljb25uZWN0aW9uOjk= 10 16\n",
      "False YXJyYXljb25uZWN0aW9uOjE1 20 16\n"
     ]
    }
   ],
   "source": [
    "def reactions_from_dataset(pub_id, page_size=10):\n",
    "    reactions = []\n",
    "    has_next_page = True\n",
    "    start_cursor = ''\n",
    "    page = 0\n",
    "    while has_next_page:\n",
    "        data = fetch(\"\"\"{{\n",
    "      reactions(pubId: \"{pub_id}\", first: {page_size}, after: \"{start_cursor}\") {{\n",
    "        totalCount\n",
    "        pageInfo {{\n",
    "          hasNextPage\n",
    "          hasPreviousPage\n",
    "          startCursor\n",
    "          endCursor \n",
    "        }}  \n",
    "        edges {{\n",
    "          node {{\n",
    "            Equation\n",
    "            reactants\n",
    "            products\n",
    "            reactionEnergy\n",
    "            reactionSystems {{\n",
    "              name\n",
    "              systems {{\n",
    "                energy\n",
    "                InputFile(format: \"json\")\n",
    "              }}\n",
    "            }}  \n",
    "          }}  \n",
    "        }}  \n",
    "      }}    \n",
    "    }}\"\"\".format(start_cursor=start_cursor,\n",
    "                 page_size=page_size,\n",
    "                 pub_id=pub_id,\n",
    "                ))\n",
    "        has_next_page = data['reactions']['pageInfo']['hasNextPage']\n",
    "        start_cursor = data['reactions']['pageInfo']['endCursor']\n",
    "        page += 1\n",
    "        print(has_next_page, start_cursor, page_size * page, data['reactions']['totalCount'])\n",
    "        reactions.extend(map(lambda x: x['node'], data['reactions']['edges']))\n",
    "\n",
    "    return reactions\n",
    "\n",
    "raw_reactions = reactions_from_dataset(\"FesterEdge2017\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After having retrieved all reactions we turn those pesky json strings into ASE atoms object. We can do so in-place like so. Note that for each fetch we can only do so once because we remove some fields."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def aseify_reactions(reactions):\n",
    "    for i, reaction in enumerate(reactions):\n",
    "        for j, _ in enumerate(reactions[i]['reactionSystems']):\n",
    "            with io.StringIO() as tmp_file:\n",
    "                system = reactions[i]['reactionSystems'][j].pop('systems')\n",
    "                tmp_file.write(system.pop('InputFile'))\n",
    "                tmp_file.seek(0)\n",
    "                atoms = ase.io.read(tmp_file, format='json')\n",
    "            calculator = ase.calculators.singlepoint.SinglePointCalculator(\n",
    "                atoms,\n",
    "                energy=system.pop('energy')\n",
    "            )\n",
    "            atoms.set_calculator(calculator)\n",
    "            #print(atoms.get_potential_energy())\n",
    "            reactions[i]['reactionSystems'][j]['atoms'] = atoms\n",
    "        # flatten list further into {name: atoms, ...} dictionary\n",
    "        reactions[i]['reactionSystems'] = {x['name']: x['atoms']\n",
    "                                          for x in reactions[i]['reactionSystems']}\n",
    "        \n",
    "reactions = copy.deepcopy(raw_reactions)\n",
    "aseify_reactions(reactions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This gives us finally a list of reactions with all ASE Atoms geometries and potential energies in place."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Equation': 'H2O(g) - H2(g) + * -> O*',\n",
       " 'products': '{\"Ostar\": 1}',\n",
       " 'reactants': '{\"star\": 1, \"H2gas\": -1.0, \"H2Ogas\": 1}',\n",
       " 'reactionEnergy': 3.033416719999991,\n",
       " 'reactionSystems': {'H2Ogas': Atoms(symbols='H2O', pbc=True, cell=[14.0, 16.526478, 16.596309], calculator=SinglePointCalculator(...)),\n",
       "  'H2gas': Atoms(symbols='H2', pbc=True, cell=[14.0, 15.0, 16.737166], calculator=SinglePointCalculator(...)),\n",
       "  'HO2star': Atoms(symbols='H7Au48Co8O20', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
       "  'HOstar': Atoms(symbols='H7Au48Co8O19', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
       "  'Ostar': Atoms(symbols='H6Au48Co8O19', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
       "  'star': Atoms(symbols='H6Au48Co8O18', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...))}}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reactions[5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# How to Retrieve Atoms Objects for a Dataset"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"First, we pick a dataset by firing up [https://www.catalysis-hub.org/publications](https://www.catalysis-hub.org/publications) . If we found an interesting dataset, we copy the pubId (without the `#` character). In the meantime, we need a couple of imports and define a fetch function for getting some of the technicalities for fetching over HTTP out of the way."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"import requests\n",
	"import pprint\n",
	"import sys\n",
	"import string\n",
	"import json\n",
	"import io\n",
	"import copy\n",
	"\n",
	"import ase.io\n",
	"import ase.calculators.singlepoint\n",
	"\n",
	"GRAPHQL = 'http://api.catalysis-hub.org/graphql'\n",
	"\n",
	"def fetch(query):\n",
	" return requests.get(\n",
	" GRAPHQL, {'query': query}\n",
	" ).json()['data']"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Let's assume we found the dataset `FesterEdge2017` to be interesting. We then query all reactions and geometries associated with it through the GraphQL endpoint. In the example we paginate into chunks of 10 reaction to avoid timeouts on a busy server. As a poormans progress bar we print out a short status line after each page."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"True YXJyYXljb25uZWN0aW9uOjk= 10 16\n",
	"False YXJyYXljb25uZWN0aW9uOjE1 20 16\n"
	]
	}
	],
	"source": [
	"def reactions_from_dataset(pub_id, page_size=10):\n",
	" reactions = []\n",
	" has_next_page = True\n",
	" start_cursor = ''\n",
	" page = 0\n",
	" while has_next_page:\n",
	" data = fetch(\"\"\"{{\n",
	" reactions(pubId: \"{pub_id}\", first: {page_size}, after: \"{start_cursor}\") {{\n",
	" totalCount\n",
	" pageInfo {{\n",
	" hasNextPage\n",
	" hasPreviousPage\n",
	" startCursor\n",
	" endCursor \n",
	" }} \n",
	" edges {{\n",
	" node {{\n",
	" Equation\n",
	" reactants\n",
	" products\n",
	" reactionEnergy\n",
	" reactionSystems {{\n",
	" name\n",
	" systems {{\n",
	" energy\n",
	" InputFile(format: \"json\")\n",
	" }}\n",
	" }} \n",
	" }} \n",
	" }} \n",
	" }} \n",
	" }}\"\"\".format(start_cursor=start_cursor,\n",
	" page_size=page_size,\n",
	" pub_id=pub_id,\n",
	" ))\n",
	" has_next_page = data['reactions']['pageInfo']['hasNextPage']\n",
	" start_cursor = data['reactions']['pageInfo']['endCursor']\n",
	" page += 1\n",
	" print(has_next_page, start_cursor, page_size * page, data['reactions']['totalCount'])\n",
	" reactions.extend(map(lambda x: x['node'], data['reactions']['edges']))\n",
	"\n",
	" return reactions\n",
	"\n",
	"raw_reactions = reactions_from_dataset(\"FesterEdge2017\")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"After having retrieved all reactions we turn those pesky json strings into ASE atoms object. We can do so in-place like so. Note that for each fetch we can only do so once because we remove some fields."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"def aseify_reactions(reactions):\n",
	" for i, reaction in enumerate(reactions):\n",
	" for j, _ in enumerate(reactions[i]['reactionSystems']):\n",
	" with io.StringIO() as tmp_file:\n",
	" system = reactions[i]['reactionSystems'][j].pop('systems')\n",
	" tmp_file.write(system.pop('InputFile'))\n",
	" tmp_file.seek(0)\n",
	" atoms = ase.io.read(tmp_file, format='json')\n",
	" calculator = ase.calculators.singlepoint.SinglePointCalculator(\n",
	" atoms,\n",
	" energy=system.pop('energy')\n",
	" )\n",
	" atoms.set_calculator(calculator)\n",
	" #print(atoms.get_potential_energy())\n",
	" reactions[i]['reactionSystems'][j]['atoms'] = atoms\n",
	" # flatten list further into {name: atoms, ...} dictionary\n",
	" reactions[i]['reactionSystems'] = {x['name']: x['atoms']\n",
	" for x in reactions[i]['reactionSystems']}\n",
	" \n",
	"reactions = copy.deepcopy(raw_reactions)\n",
	"aseify_reactions(reactions)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This gives us finally a list of reactions with all ASE Atoms geometries and potential energies in place."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'Equation': 'H2O(g) - H2(g) + * -> O*',\n",
	" 'products': '{\"Ostar\": 1}',\n",
	" 'reactants': '{\"star\": 1, \"H2gas\": -1.0, \"H2Ogas\": 1}',\n",
	" 'reactionEnergy': 3.033416719999991,\n",
	" 'reactionSystems': {'H2Ogas': Atoms(symbols='H2O', pbc=True, cell=[14.0, 16.526478, 16.596309], calculator=SinglePointCalculator(...)),\n",
	" 'H2gas': Atoms(symbols='H2', pbc=True, cell=[14.0, 15.0, 16.737166], calculator=SinglePointCalculator(...)),\n",
	" 'HO2star': Atoms(symbols='H7Au48Co8O20', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
	" 'HOstar': Atoms(symbols='H7Au48Co8O19', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
	" 'Ostar': Atoms(symbols='H6Au48Co8O19', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...)),\n",
	" 'star': Atoms(symbols='H6Au48Co8O18', pbc=True, cell=[5.878883441, 20.365049623, 20.83304], calculator=SinglePointCalculator(...))}}"
	]
	},
	"execution_count": 4,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"reactions[5]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.5"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}