Skip to content

Instantly share code, notes, and snippets.

@wtbarnes
Last active October 31, 2017 07:23
Show Gist options
  • Save wtbarnes/1a9acc43381a33d016843c666150e8e9 to your computer and use it in GitHub Desktop.
Save wtbarnes/1a9acc43381a33d016843c666150e8e9 to your computer and use it in GitHub Desktop.
Tutorial for parsing CHIANTI atomic data, building it into an HDF5 file, and accessing it
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# *fiasco* Tutorial\n",
"This notebook shows how to parse the CHIANTI atomic database, build the database into an HDF5 file, and then access the data by streaming it from the HDF5 file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import h5py\n",
"import astropy.units as u\n",
"import fiasco\n",
"import fiasco.io\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"See the install instructions [here](https://github.com/wtbarnes/fiasco). If you'd like to use an already installed version of the CHIANTI database, set the path to the root of the CHIANTI file tree in `$HOME/.fiasco/fiascorc`,\n",
"```yaml\n",
"[database]\n",
"ascii_dbase_root=/my/path/to/chianti/dbase\n",
"``` \n",
"To check the default location of the ASCII files and the HDF5 database,"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.defaults"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we need to install the database (if you don't already have it downloaded) and build the HDF5 database. To do this,"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.util.download_dbase(fiasco.defaults['ascii_dbase_root'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.util.build_hdf5_dbase(fiasco.defaults['ascii_dbase_root'],fiasco.defaults['hdf5_dbase_root'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**This may take 15-20 minutes, especially if you need to download the database. However, this only needs to be done once (or each time we want to update the raw data in the database)**. Note that this is also done automatically when an `IonBase` object is declared. We're doing it explicitly here because we want to play around with the raw data first. But in general, this step does not need to be done manually."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parsing the Raw CHIANTI Database\n",
"This package provides a convenient parser interface for any CHIANTI filetype. The parser uses a factory pattern to create a different subclass based on the file extension. This allows the interface to multiple different files to be the same and provides the needed metadata (title/description and units) to each column in each file.\n",
"\n",
"First, implement a parser for a specific CHIANTI file, say `h_1.elvlc` which contains the energy levels for H I."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"parser = fiasco.io.Parser('h_1.wgfa')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"parser"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that this returns a parser specifically for the `.elvlc` filetype. Now, call `.parse()` on the parser to return the data in this file as an [Astropy quantity table](http://docs.astropy.org/en/stable/table/mixin_columns.html#quantity-and-qtable) with associated metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"table = parser.parse()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"table['wavelength']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that each column is an astropy quantity so we can easily convert between different units and the data itself is *units aware*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"table['wavelength'].to(u.imperial.furlong)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `.meta` dictionary holds additional information, including the original file footer. This ensures that we preserve all of the original sources for the data that went into the CHIANTI database. This metadata is propagated through to the HDF5 database."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"table.meta"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(table.meta['footer'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that this same interface persists for every filetype and for every ion..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"fiasco.io.Parser('h_1.elvlc').parse()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.io.Parser('fe_11.scups').parse()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The same interface can be used to read the abundance, ionization equilibrium, and ionization potential files too."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.io.Parser('sun_coronal_1992_feldman.abund').parse()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.io.Parser('chianti.ioneq').parse()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.io.Parser('chianti.ip').parse()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"fiasco uses a subclass registry in the parser factory to determine the right subclass for the provided extension. This means you can write your own parser for a specific filetype by just subclassing the `fiasco.io.GenericParser` base class and adding a `filetype` attribute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class JpegParser(fiasco.io.GenericParser):\n",
" filetype='jpeg'\n",
" def __init__(self,*args,**kwargs):\n",
" print('A custom registered parser')\n",
" def parse(self):\n",
" print('Nothing to parse here')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.io.Parser('hello.jpeg').parse()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `IonBase`-- Base Data Access Layer to CHIANTI HDF5 Data\n",
"Now that we've parsed the data and built it into an HDF5 file, we need a way to access it. `fiasco` implements a class `IonBase` for accessing all of the data associated with a particular ion.\n",
"\n",
"The idea behind this object is to provide a general base layer for accessing CHIANTI data. More specific objects can then be built on top of this data access layer. Additionally, this provides a nice base layer for users/other applications to build on top if they need to access CHIANTI data but don't want the overhead or clutter of the specific CHIANTI methods, e.g. all the different needed calculations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion = fiasco.IonBase('h_1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There's some additional metadata related to the name of the ion attached to this object as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(h1_ion.atomic_number)\n",
"print(h1_ion.atomic_symbol)\n",
"print(h1_ion.element_name)\n",
"print(h1_ion.charge_state)\n",
"print(h1_ion.ionization_stage)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This class has an attribute for each filetype."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.elvlc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.wgfa"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `IonBase` class provides some basic metadata about what atomic data is attached to the given attribute. The fields can be accessed just like a dictionary,"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.elvlc['E_obs']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.wgfa['wavelength']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For convenience, the filetypes can also be returned as a table."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.elvlc.as_table()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If a particular filetype does not exist for that ion, `None` is returned."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.easplom is None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note: Each `__getitem__` is a read on the HDF5 file. While this may result in some additional overhead, it means that we don't pay in having to store information we don't need. This is especially advantagious for particularly large files. For internal routines, **\n",
"\n",
"For the abundance, ionization potential, and ionization equilibrium data, the keys are the different available datasets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.abundance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.ip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"h1_ion.ioneq"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## `Ion` Object\n",
"The `Ion` object is the base unit for the fiasco library. It subclasses `IonBase` (unsurprisingly) and adds methods for calculating common derived quantities, including ionization and recombination rates, the contribution function, emissivity, and spectrum. **Thus far, only the ionization and recombination rates have been implemented.**\n",
"\n",
"Additionally, `Ion` overrides the `ip`, `ioneq`, and `abundance` attributes to select only specific datasets. The default datasets can be overridden as kwargs when the object is instantiated. In the future, such defaults could also be set in `fiascorc`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"temperature = np.logspace(4,9,200)*u.K"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fe_ion = fiasco.Ion('fe_5',temperature)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fe_ion.ip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fe_ion.abundance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ionization equilibrium is interpolated to the given temperature."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.plot(fe_ion.temperature,fe_ion.ioneq)\n",
"plt.xlabel(r'$T$ [{:latex}]'.format(fe_ion.temperature.unit))\n",
"plt.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The typical data attributes are still available as well and are needed for computing derived quantities."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fe_ion.elvlc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fe_ion.scups"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can compute the ionization and recombination rates. **Implementing the many derived quantities that CHIANTI IDL and ChiantiPy provide is ongoing work.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ionization_rate_ea = fe_ion.excitation_autoionization_rate()\n",
"ionization_rate_direct = fe_ion.direct_ionization_rate()\n",
"ionization_rate = fe_ion.ionization_rate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.plot(fe_ion.temperature,ionization_rate,label='total')\n",
"plt.plot(fe_ion.temperature,ionization_rate_ea,'^',markevery=10,label='excitation-autoionization')\n",
"plt.plot(fe_ion.temperature,ionization_rate_direct,'o',markevery=10,label='direct')\n",
"plt.xlabel(r'$T$ [{:latex}]'.format(fe_ion.temperature.unit))\n",
"plt.ylabel(r'Ionization Rate [{:latex}]'.format(ionization_rate.unit))\n",
"plt.xscale('log')\n",
"plt.yscale('log')\n",
"plt.ylim([1e-15,5e-8])\n",
"plt.legend(loc='best')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"recombination_rate_rad = fe_ion.radiative_recombination_rate()\n",
"recombination_rate_di = fe_ion.dielectronic_recombination_rate()\n",
"recombination_rate = fe_ion.recombination_rate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.plot(fe_ion.temperature,recombination_rate,label='total')\n",
"plt.plot(fe_ion.temperature,recombination_rate_di,'^',markevery=10,label='dielectronic')\n",
"plt.plot(fe_ion.temperature,recombination_rate_rad,'o',markevery=10,label='radiative')\n",
"plt.xlabel(r'$T$ [{:latex}]'.format(fe_ion.temperature.unit))\n",
"plt.ylabel(r'Recombination Rate [{:latex}]'.format(recombination_rate.unit))\n",
"plt.xscale('log')\n",
"plt.yscale('log')\n",
"plt.ylim([1e-15,1e-10])\n",
"plt.legend(loc='best')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-ion Objects\n",
"There are some quantities which logically require multiple ions to calculate. To this end, fiasco provides a few different ways to group `Ion` objects."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `Element`\n",
"The first of these is the `Element` class."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"el = fiasco.Element('iron',temperature)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Elements can be created using the element name, the atomic symbol, or the atomic number, e.g. all of these create the same object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fiasco.Element('fe',temperature)\n",
"fiasco.Element('iron',temperature)\n",
"fiasco.Element('Fe',temperature)\n",
"fiasco.Element(26,temperature)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ions within the element can then be indexed either by integer index,"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"el[10].ion_name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or by the ion name"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"el[5].ion_name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also test whether the element contains a particular ion."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"'fe_11' in el"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"'fe_28' in el"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also easily iterate through the element. For example, we might want to plot the ionization equilibrium data for all ions of iron."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12,8))\n",
"for ion in el:\n",
" plt.plot(ion.temperature,ion.ioneq,label=r'{}'.format(ion.ion_name))\n",
"plt.xscale('log')\n",
"plt.legend(loc='upper right',ncol=2)\n",
"plt.xlabel(r'$T$ [{:latex}]'.format(ion.temperature.unit))\n",
"plt.ylabel(r'Ionization Fraction')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `IonCollection`\n",
"fiasco also provides a more general multi-ion container called `IonCollection`. In the future, we will implement the spectrum and radiative loss calculations (and any other multi-ion quantities) as methods on `IonCollection`.\n",
"\n",
"`IonCollection` objects are created very intuitively by adding `Ion` objects together,"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ion_1 = fiasco.Ion('ca_10',temperature)\n",
"ion_2 = fiasco.Ion('li_2',temperature)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"collection_1 = ion_1 + ion_2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Elements can also be added"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"el_1 = fiasco.Element('carbon',temperature)\n",
"el_2 = fiasco.Element('fe',temperature)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"collection_2 = el_1 + el_2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And of course collections can be added together as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"collection_3 = collection_1 + collection_2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These collections can then be indexed"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"collection_3[0].ion_name"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"collection_3[-1].atomic_symbol"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`IonCollection` objects can also be instantiated directly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"collection_4 = fiasco.IonCollection(el_1,el_2,ion_1,ion_2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This object will make computing multi-ion spectra easy and intuitive as the user can easily \"compose\" the spectra by adding base units (the `Ion` objects) together to build a composite object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [conda env:fiasco]",
"language": "python",
"name": "conda-env-fiasco-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@kdere
Copy link

kdere commented Sep 5, 2017

I had started building HDF5 files of the CHIANTI database using Python Tables which I had some experience with. Also, because it allowed me to insert the references as a table. The main reason for doing this was to increase the speed of reading in some of the data files. It turned out that some of the biggest files, such as some .wgfa files, kept every single calculated transition, even some with an A value of 0.0, making them very large. If these were reduced by enforcing a minimum branching ratio of the A values to something like 1.e-4 or 1.e-5, a lot of the problem went away.
Also, the plan was to keep the ascii files as our basic file structure.

@wtbarnes
Copy link
Author

Sorry for not responding @kdere. I just saw this comment. For some reason GitHub didn't send me a notification or I missed it somehow...

PyTables may be a better way to go. It seems more full featured, but a bit of a steeper learning curve. I've just used h5py because I have experience with it and it is simpler. If some of the functionality I've built can be replaced or improved by PyTables, I'm all for switching over.

The increase in read speeds is one of my main motivations for doing this as well. Better to read the ASCII data once and then all subsequent reads are just from the HDF5 "database". Additionally, this means that you can easily slice into portions of a large file (e.g. some of the Fe .wgfa files) without having to load the whole thing into memory. I've found that ChiantiPy can be a bit sluggish for some of these ions with many, many transitions.

Though the format is HDF5, the format is essentially exactly the same as the ASCII database. There are groups for each element, subgroups for each ion, and then datasets for the individual files (.elvlc, .wgfa, etc.) within those subgroups.

If you have some time, play around with the package and let me know what you think (here or in an issue on the main package page).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment