Skip to content

Instantly share code, notes, and snippets.

@mhoffman
Created June 9, 2018 17:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mhoffman/556332aaac0e7e11769ce28848b6b721 to your computer and use it in GitHub Desktop.
Save mhoffman/556332aaac0e7e11769ce28848b6b721 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preparations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start by making some necessary imports and definitions. You have have to install `requests` first by running `pip install --user requests`."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import pprint\n",
"import sys\n",
"\n",
"GRAPHQL = 'http://api.catalysis-hub.org/graphql'\n",
"\n",
"def fetch(query):\n",
" return requests.get(\n",
" GRAPHQL, {'query': query}\n",
" ).json()['data']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### List of Publications"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start flexing our quering muscles by quering a list of publications."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'authors': '[\"Boes, Jacob\"]',\n",
" 'doi': None,\n",
" 'id': 'UHVibGljYXRpb246NjA=',\n",
" 'journal': None,\n",
" 'title': 'Adsorption energies on fcc 111 transition metals',\n",
" 'year': 2018},\n",
" {'authors': '[\"Montoya, Joseph H.\", \"Tsai, Charlie\", \"Vojvodic, Aleksandra\", '\n",
" '\"Norskov, Jens K.\"]',\n",
" 'doi': '10.1002/cssc.201500322',\n",
" 'id': 'UHVibGljYXRpb246NjQ=',\n",
" 'journal': 'ChemSusChem',\n",
" 'title': 'The Challenge of Electrochemical Ammonia Synthesis: A New '\n",
" 'Perspective on the Role of Nitrogen Scaling Relations',\n",
" 'year': 2015},\n",
" {'authors': '[\"Catapp\"]',\n",
" 'doi': None,\n",
" 'id': 'UHVibGljYXRpb246MzM=',\n",
" 'journal': None,\n",
" 'title': None,\n",
" 'year': 2012}]\n"
]
}
],
"source": [
"raw_publications = fetch(\"\"\"{publications {\n",
" edges {\n",
" node {\n",
" id\n",
" authors\n",
" title\n",
" journal\n",
" year\n",
" doi\n",
" }\n",
" }\n",
"}}\n",
"\"\"\")['publications']['edges']\n",
"publications = list(map(lambda x: x['node'], raw_publications))\n",
"pprint.pprint(publications[:3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We only show the first 3 results here for brevity but of you can retrieve the full list by removing the `[:3]` slice. The `['edges']['node']` may seem a little annoying, but it will allow us to responses that would be too large for a single request, as we will see below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Query Reactions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, let's query some reactions. This is the same type of query as you would get from the <a href=\"https://www.catalysis-hub.org/energies\" target=\"_blank\">Reaction Energetics App</a>. Let's get all energies that end with `CO` adsorbed on the surface and some Palladium in the surface. The tilde (`~`) before the `Pd` indicates that the field only has to contain `Pd`. If you want the exact match, drop the tilde. Here we have al"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'reactions': {'totalCount': 74,\n",
" 'pageInfo': {'hasNextPage': True,\n",
" 'hasPreviousPage': False,\n",
" 'startCursor': 'YXJyYXljb25uZWN0aW9uOjA=',\n",
" 'endCursor': 'YXJyYXljb25uZWN0aW9uOjk='},\n",
" 'edges': [{'node': {'reactants': '{\"star\": 1, \"COgas\": 1}',\n",
" 'products': '{\"COstar\": 1}',\n",
" 'Equation': 'CO(g) + * -> CO*',\n",
" 'reactionEnergy': -2.01383127677,\n",
" 'chemicalComposition': 'Pd4'}},\n",
" {'node': {'reactants': '{\"star\": 1, \"COgas\": 1}',\n",
" 'products': '{\"COstar\": 1}',\n",
" 'Equation': 'CO(g) + * -> CO*',\n",
" 'reactionEnergy': -1.74274934594,\n",
" 'chemicalComposition': 'Pd36'}},\n",
" {'node': {'reactants': '{\"star\": 1, \"CHCOstar\": 1}',\n",
" 'products': '{\"CHstar\": 1, \"COstar\": 1}',\n",
" 'Equation': 'CHCO* + * -> CH* + CO*',\n",
" 'reactionEnergy': -0.971622912111,\n",
" 'chemicalComposition': 'Pd36'}},\n",
" {'node': {'reactants': '{\"CHOstar\": 1}',\n",
" 'products': '{\"COstar\": 1, \"hfH2gas\": 1}',\n",
" 'Equation': 'CHO* -> hfH2(g) + CO*',\n",
" 'reactionEnergy': -0.71475,\n",
" 'chemicalComposition': 'Co3Pd'}},\n",
" {'node': {'reactants': '{\"star\": 1, \"CH3COstar\": 1}',\n",
" 'products': '{\"COstar\": 1, \"CH3star\": 1}',\n",
" 'Equation': 'CH3CO* + * -> CH3* + CO*',\n",
" 'reactionEnergy': -0.531420322484,\n",
" 'chemicalComposition': 'Pd36'}},\n",
" {'node': {'reactants': '{\"star\": 1, \"COgas\": 1}',\n",
" 'products': '{\"COstar\": 1}',\n",
" 'Equation': 'CO(g) + * -> CO*',\n",
" 'reactionEnergy': -0.356524751,\n",
" 'chemicalComposition': 'Zn3Pd'}},\n",
" {'node': {'reactants': '{\"star\": 1, \"COgas\": 1}',\n",
" 'products': '{\"COstar\": 1}',\n",
" 'Equation': 'CO(g) + * -> CO*',\n",
" 'reactionEnergy': -0.466524751,\n",
" 'chemicalComposition': 'Cd3Pd'}},\n",
" {'node': {'reactants': '{\"star\": 1, \"COgas\": 1}',\n",
" 'products': '{\"COstar\": 1}',\n",
" 'Equation': 'CO(g) + * -> CO*',\n",
" 'reactionEnergy': -0.968702757017,\n",
" 'chemicalComposition': 'Pd4'}},\n",
" {'node': {'reactants': '{\"star\": 1, \"CHOstar\": 1}',\n",
" 'products': '{\"Hstar\": 1, \"COstar\": 1}',\n",
" 'Equation': 'CHO* + * -> CO* + H*',\n",
" 'reactionEnergy': -1.23463251346,\n",
" 'chemicalComposition': 'Pd36'}},\n",
" {'node': {'reactants': '{\"star\": 1, \"COgas\": 1}',\n",
" 'products': '{\"COstar\": 1}',\n",
" 'Equation': 'CO(g) + * -> CO*',\n",
" 'reactionEnergy': 0.97,\n",
" 'chemicalComposition': 'HH- Pd-MoS2'}}]}}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fetch(\"\"\"\n",
"{reactions(first: 10, products:\"CO\", chemicalComposition:\"~Pd\") {\n",
" totalCount\n",
" pageInfo {\n",
" hasNextPage\n",
" hasPreviousPage\n",
" startCursor\n",
" endCursor\n",
" }\n",
" edges {\n",
" node {\n",
" reactants\n",
" products\n",
" Equation\n",
" reactionEnergy\n",
" chemicalComposition\n",
" }\n",
" }\n",
"}}\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Query Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next up is `systems`. We use a different filter to filter for energies > -14 eV. So that should gives use from H or H2 at best."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'systems': {'totalCount': 3,\n",
" 'edges': [{'node': {'id': 'U3lzdGVtOjEyNTA4',\n",
" 'Formula': 'H2',\n",
" 'Cifdata': 'data_image0\\n_cell_length_a 14\\n_cell_length_b 15\\n_cell_length_c 16.7372\\n_cell_angle_alpha 90\\n_cell_angle_beta 90\\n_cell_angle_gamma 90\\n\\n_symmetry_space_group_name_H-M \"P 1\"\\n_symmetry_int_tables_number 1\\n\\nloop_\\n _symmetry_equiv_pos_as_xyz\\n \\'x, y, z\\'\\n\\nloop_\\n _atom_site_label\\n _atom_site_occupancy\\n _atom_site_fract_x\\n _atom_site_fract_y\\n _atom_site_fract_z\\n _atom_site_thermal_displace_type\\n _atom_site_B_iso_or_equiv\\n _atom_site_type_symbol\\n H1 1.0000 0.50000 0.50000 0.52241 Biso 1.000 H\\n H2 1.0000 0.50000 0.50000 0.47760 Biso 1.000 H\\n',\n",
" 'energy': -6.7714919,\n",
" 'calculatorParameters': '{}'}},\n",
" {'node': {'id': 'U3lzdGVtOjI5Mzc=',\n",
" 'Formula': 'H2',\n",
" 'Cifdata': 'data_image0\\n_cell_length_a 14\\n_cell_length_b 15\\n_cell_length_c 16.7372\\n_cell_angle_alpha 90\\n_cell_angle_beta 90\\n_cell_angle_gamma 90\\n\\n_symmetry_space_group_name_H-M \"P 1\"\\n_symmetry_int_tables_number 1\\n\\nloop_\\n _symmetry_equiv_pos_as_xyz\\n \\'x, y, z\\'\\n\\nloop_\\n _atom_site_label\\n _atom_site_occupancy\\n _atom_site_fract_x\\n _atom_site_fract_y\\n _atom_site_fract_z\\n _atom_site_thermal_displace_type\\n _atom_site_B_iso_or_equiv\\n _atom_site_type_symbol\\n H1 1.0000 0.50000 0.50000 0.52243 Biso 1.000 H\\n H2 1.0000 0.50000 0.50000 0.47757 Biso 1.000 H\\n',\n",
" 'energy': -6.75954945,\n",
" 'calculatorParameters': '{}'}},\n",
" {'node': {'id': 'U3lzdGVtOjI3MjY=',\n",
" 'Formula': 'H2',\n",
" 'Cifdata': 'data_image0\\n_cell_length_a 14\\n_cell_length_b 15\\n_cell_length_c 16.7372\\n_cell_angle_alpha 90\\n_cell_angle_beta 90\\n_cell_angle_gamma 90\\n\\n_symmetry_space_group_name_H-M \"P 1\"\\n_symmetry_int_tables_number 1\\n\\nloop_\\n _symmetry_equiv_pos_as_xyz\\n \\'x, y, z\\'\\n\\nloop_\\n _atom_site_label\\n _atom_site_occupancy\\n _atom_site_fract_x\\n _atom_site_fract_y\\n _atom_site_fract_z\\n _atom_site_thermal_displace_type\\n _atom_site_B_iso_or_equiv\\n _atom_site_type_symbol\\n H1 1.0000 0.50000 0.50000 0.52241 Biso 1.000 H\\n H2 1.0000 0.50000 0.50000 0.47760 Biso 1.000 H\\n',\n",
" 'energy': -6.7714919,\n",
" 'calculatorParameters': '{}'}}]}}"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fetch(\"\"\"\n",
"{systems(first: 100, energy: -14, op:\">\") {\n",
" totalCount\n",
" edges {\n",
" node {\n",
" id\n",
" \tFormula\n",
" Cifdata\n",
" energy\n",
" calculatorParameters\n",
" }\n",
" }\n",
"}}\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Combining Queries and Stepping Through Large Queries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The main tables that `catalysis-hub.org` offers are `reactions`, `systems`, and `publications`. Often it is useful to query more than one table at once (i.e. SQL join) to filter one table but get the data from a different table associated with it. Example we want to filter for a certain type of reaction and get the structures associated with it."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'reactions': {'totalCount': 74,\n",
" 'pageInfo': {'hasNextPage': True,\n",
" 'hasPreviousPage': False,\n",
" 'startCursor': 'YXJyYXljb25uZWN0aW9uOjA=',\n",
" 'endCursor': 'YXJyYXljb25uZWN0aW9uOjA='},\n",
" 'edges': [{'node': {'id': 'UmVhY3Rpb246OTI=',\n",
" 'reactants': '{\"star\": 1, \"COgas\": 1}',\n",
" 'products': '{\"COstar\": 1}',\n",
" 'Equation': 'CO(g) + * -> CO*',\n",
" 'reactionEnergy': -2.01383127677,\n",
" 'chemicalComposition': 'Pd4',\n",
" 'systems': [{'InputFile': ' C O Pd \\n 1.0000000000000000\\n 11.2810760000000005 0.0000000000000000 0.0000000000000000\\n 5.6405399999999997 9.7697000000000003 0.0000000000000000\\n 0.0000000000000000 0.0000000000000000 20.9082210000000011\\n 1 1 64\\nCartesian\\n 1.4101346666666701 0.8141416666666670 15.2669614237632008\\n 1.4101346666666701 0.8141416666666670 16.4476892742715997\\n -0.0105757391761526 -0.0061056535666691 13.9826267161823008\\n 1.4101350719169199 2.4546371834091101 13.9826267161823008\\n 2.8197367200103298 4.8848366356889299 13.8956023395766000\\n 4.2301502216020603 7.3277440144596904 13.8956023395766000\\n 2.8308448830099899 -0.0061056578967144 13.9826267161823008\\n 4.2594496261227102 2.4591947404922401 13.8967557275773999\\n 5.6390405883208699 4.8857154407210297 13.8924823774328008\\n 7.0506740719169203 7.2937363790691201 13.8967557275773999\\n 5.6402595703251999 -0.0004547779845310 13.8956023395766000\\n 7.0506730719169202 2.4406949911171898 13.8924823774328008\\n 8.4623065555129493 4.8857154401075098 13.8924823774328008\\n 9.8711979222318096 7.3277440143553401 13.8956023395766000\\n 8.4610855735086492 -0.0004547780985577 13.8956023395766000\\n 9.8418965177111204 2.4591947523843500 13.8967557275773999\\n 11.2816104238234995 4.8848366354705801 13.8956023395766000\\n 12.6912120719168993 7.3272752906485801 13.8955858240988999\\n 2.8189270855692601 1.6275084501303201 11.6033213505450004\\n 4.2297032849700500 4.0706417361641396 11.5841631011236998\\n 5.6402465574528096 6.5137738846480797 11.5841631011236998\\n 7.0506743516650996 8.9571083225495993 11.6033213505450004\\n 5.6401300791823497 1.6277096017355099 11.5841631011236998\\n 7.0506733516651003 4.0707084074202298 11.5863926674548008\\n 8.4611011458774197 6.5137738845281801 11.5841631011236998\\n 9.8610284784020106 8.9612827632587102 11.6035304482832995\\n 8.4612156241478793 1.6277096015683501 11.5841631011236998\\n 9.8716434183601400 4.0706417358770901 11.5841631011236998\\n 11.2805271761174009 6.5128157635844000 11.5876907260538999\\n 12.6912123516651008 8.9561936953171308 11.5876907260538999\\n 11.2824186177609000 1.6275084495807499 11.6033213505450004\\n 12.6912113516650997 4.0592596998026798 11.6035304482832995\\n 14.1018965272127996 6.5128157633591401 11.5876907260538999\\n 15.5213962249281998 8.9612827591992907 11.6035304482832995\\n 1.4101346652565301 0.8141416658525250 9.3027403703600005\\n 2.8202696652565300 3.2565666658525201 9.3027403703600005\\n 4.2304046652565299 5.6989916658525299 9.3027403703600005\\n 5.6405396652565303 8.1414166658525193 9.3027403703600005\\n 4.2304036652565298 0.8141416658525250 9.3027403703600005\\n 5.6405386652565301 3.2565666658525201 9.3027403703600005\\n 7.0506736652565296 5.6989916658525299 9.3027403703600005\\n 8.4608086652565309 8.1414166658525193 9.3027403703600005\\n 7.0506726652565304 0.8141416658525250 9.3027403703600005\\n 8.4608076652565298 3.2565666658525201 9.3027403703600005\\n 9.8709426652565302 5.6989916658525299 9.3027403703600005\\n 11.2810776652565004 8.1414166658525193 9.3027403703600005\\n 9.8709416652565292 0.8141416658525250 9.3027403703600005\\n 11.2810766652564993 3.2565666658525201 9.3027403703600005\\n 12.6912116652564997 5.6989916658525299 9.3027403703600005\\n 14.1013466652565000 8.1414166658525193 9.3027403703600005\\n 0.0000000000000000 0.0000000000000000 7.0000001319882204\\n 1.4101349999999999 2.4424250000000001 7.0000001319882204\\n 2.8202699999999998 4.8848500000000001 7.0000001319882204\\n 4.2304050000000002 7.3272750000000002 7.0000001319882204\\n 2.8202690000000001 0.0000000000000000 7.0000001319882204\\n 4.2304040000000001 2.4424250000000001 7.0000001319882204\\n 5.6405390000000004 4.8848500000000001 7.0000001319882204\\n 7.0506739999999999 7.3272750000000002 7.0000001319882204\\n 5.6405380000000003 0.0000000000000000 7.0000001319882204\\n 7.0506729999999997 2.4424250000000001 7.0000001319882204\\n 8.4608080000000001 4.8848500000000001 7.0000001319882204\\n 9.8709430000000005 7.3272750000000002 7.0000001319882204\\n 8.4608070000000009 0.0000000000000000 7.0000001319882204\\n 9.8709419999999994 2.4424250000000001 7.0000001319882204\\n 11.2810769999999998 4.8848500000000001 7.0000001319882204\\n 12.6912120000000002 7.3272750000000002 7.0000001319882204\\n'},\n",
" {'InputFile': ' C O \\n 1.0000000000000000\\n 14.0000000000000000 0.0000000000000000 0.0000000000000000\\n 0.0000000000000000 15.0000000000000000 0.0000000000000000\\n 0.0000000000000000 0.0000000000000000 16.0000000000000000\\n 1 1\\nCartesian\\n 6.9999985159999998 7.4999951400000002 7.4337279040000004\\n 7.0000014840000002 7.5000048599999998 8.5662720960000005\\n'},\n",
" {'InputFile': 'Pd \\n 1.0000000000000000\\n 2.8202690000000001 0.0000000000000000 0.0000000000000000\\n 1.4101349999999999 2.4424250000000001 0.0000000000000000\\n 0.0000000000000000 0.0000000000000000 20.9082210000000011\\n 4\\nCartesian\\n 0.0000000000000000 0.0000000000000000 7.0000001319882204\\n 1.4101346652565301 0.8141416658525250 9.3027403703600005\\n 2.8202693516650998 1.6282834074202299 11.5917470855843998\\n 0.0000000719169190 0.0000002906485750 13.9042749849117993\\n'}]}}]}}"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reaction_systems = fetch(\"\"\"{reactions(first: 1, after:\"\", products:\"CO\", chemicalComposition:\"~Pd\") {\n",
" totalCount\n",
" pageInfo {\n",
" hasNextPage\n",
" hasPreviousPage\n",
" startCursor\n",
" endCursor\n",
" }\n",
" edges {\n",
" node {\n",
" id\n",
" reactants\n",
" products\n",
" Equation\n",
" reactionEnergy\n",
" chemicalComposition\n",
" systems{\n",
" InputFile(format:\"vasp\")\n",
" }\n",
" }\n",
" }\n",
"}}\n",
"\"\"\")\n",
"reaction_systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One constraint we have to work with is that our server times out requests after 30 seconds (gives others a chance to query, too). Especially when generating a lot of structures we can quickly run into this limitation. To get around this we can use the `pageInfo` attributes as well as the `first` and `after` keywords to roll our own pagination and combine the whole list. We will do simple loop that doesn't end and `break` out of it, when the `pageInfo` indicates that we are done. To step through a large query, do this:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
".............. Done!\n"
]
}
],
"source": [
"end_cursor = ''\n",
"reaction_systems = {}\n",
"\n",
"while True:\n",
" response = fetch(\"{reactions(first: 5, after:\\\"\" + end_cursor + \"\"\"\", products:\"CO\", chemicalComposition:\"~Pd\") {\n",
" totalCount\n",
" pageInfo {\n",
" hasNextPage\n",
" hasPreviousPage\n",
" startCursor\n",
" endCursor\n",
" }\n",
" edges {\n",
" node {\n",
" id\n",
" reactants\n",
" products\n",
" Equation\n",
" reactionEnergy\n",
" chemicalComposition\n",
" systems{\n",
" InputFile(format:\"vasp\")\n",
" }\n",
" }\n",
" }\n",
"}}\"\"\")\n",
" for edge in response['reactions']['edges']:\n",
" reaction_systems[edge['node']['id']] = edge['node']\n",
" # Book-keeping for pagination\n",
" if not response['reactions']['pageInfo']['hasNextPage']:\n",
" sys.stdout.write(' Done!\\n')\n",
" break\n",
"\n",
" end_cursor = response['reactions']['pageInfo']['endCursor']\n",
" sys.stdout.write('.')"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"74"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(list(reaction_systems.keys()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can do further analysis with this combined data set. Note that some reaction energies do not contains geometries (especially older ones). For purely technical reasons they have a placeholder geometry with only one Hydrogen from and a `1x1x1` Angstrom unit cell."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### More Resources"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order quickly test what are possible queries, we have a <a href=\"http://api.catalysis-hub.org/graphql\" target=\"_blank\">GraphiQL Interface</a>. You can write your own queries and GraphiQL will try to complete your keywords. Once your are happy with the results, you can copy the query back into e.g. Jupyter Notebook for further analysis. Also check out our <a href=\"http://docs.catalysis-hub.org/\" target=\"_blank\">Documentation</a> for complete reference of the database schema and more tutorials and examples."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment