Skip to content

Instantly share code, notes, and snippets.

@pfernique
Last active November 3, 2016 14:04
Show Gist options
  • Save pfernique/b4d9222c60ecb67d011425c800980723 to your computer and use it in GitHub Desktop.
Save pfernique/b4d9222c60ecb67d011425c800980723 to your computer and use it in GitHub Desktop.
JSS autowig examples
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Wrapping dependent libraries\n",
"\n",
"**StructureAnalysis** is a set of libraries that include statistical models for the analysis of structured data (mainly sequences and tree-structured data):\n",
"\n",
"* **StatTool** is a library containing classes concerning parametric modeling of univariate and multivariate data.\n",
"\n",
"* **SequenceAnalysis** is a library containing statistical functions and classes for markovian models (e.g., hidden variable-order Markov and hidden semi-Markov models) and multiple change-point models for sequences.\n",
" The **SequenceAnalysis** library depends on the **StatTool** library.\n",
"\n",
"These libraries have been extensively used for the identification and characterization of developmental patterns in plants from the tissular to the whole plant scale.\n",
"Previously interfaced with *AML* (a home-made, domain-specific programming language), some work has been done to switch to *Python*.\n",
"Nevertheless, the complexity of writing wrappers with **Boost.Python** limited the number of available components in *Python* in comparison to *AML*.\n",
"One advantage of having a statistical library written in *C++* available in *Python* is that developers can benefit from all *Python* packages.\n",
"As illustrated with the following figures this is particularly useful for providing visualizations for model quality assessment using -- for example -- the **Matplotlib** *Python* package."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import matplotlib\n",
"%matplotlib nbagg"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preamble\n",
"\n",
"The code source is available in the `StructureAnalysis` directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!pygmentize StructureAnalysis/stat_tool/src/cpp/stat_tools.h"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This directory already contains wrappers, we therefore need to remove them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from path import path\n",
"srcdir = dict()\n",
"for library in ['stat_tool', 'sequence_analysis']:\n",
" srcdir[library] = path('StructureAnalysis')/library/'src'/'py'\n",
" for wrapper in srcdir[library].walkfiles('*.cpp'):\n",
" wrapper.unlink()\n",
" for wrapper in srcdir[library].walkfiles('*.h'):\n",
" wrapper.unlink()\n",
" wrapper = srcdir[library]/library/'_' + library + '.py'\n",
" if wrapper.exists():\n",
" wrapper.unlink()\n",
" srcdir[library] = srcdir[library].parent.parent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we need to install and compile the *C++* libraries.\n",
"This is done using available **Conda** recipes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!conda build StructureAnalysis/stat_tool/conda/libstat_tool -c statiskit -c conda-forge\n",
"!conda install libstat_tool --use-local -c statiskit -c conda-forge\n",
"!conda build StructureAnalysis/sequence_analysis/conda/libsequence_analysis -c statiskit -c conda-forge\n",
"!conda install libsequence_analysis --use-local -c statiskit -c conda-forge"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The **StatTool** library\n",
"\n",
"### Wrapping the *C++* library\n",
"Once these preliminaries done, we can proceed to the actual generation of wrappers for the **StatTool** *C++* library.\n",
"For this, we import **AutoWIG** and create an empty Abstract Semantic Graph (ASG)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import autowig\n",
"asg = autowig.AbstractSemanticGraph()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we parse headers with relevant compilation flags."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%time\n",
"import sys\n",
"prefix = path(sys.prefix)\n",
"autowig.parser.plugin = 'pyclanglite'\n",
"asg = autowig.parser(asg, (prefix/'include'/'stat_tool').walkfiles('*.h*'),\n",
" ['-x', 'c++', '-std=c++11', '-I' + str((prefix/'include').abspath())],\n",
" silent = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since most of **AutoWIG** guidelines are respected, the `default` `controller` implementation could be suitable.\n",
"Nevertheless some **AutoWIG** limitations (**AutoWIG** doesn't have a complete knowledge concering copyable classes) and the requirement of classes defined in the standard *C++* library lead us to implement a new `controller`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def stat_tool_controller(asg):\n",
" for noncopyable in ['class ::std::basic_streambuf< char, struct ::std::char_traits< char > >',\n",
" 'class ::std::codecvt< char, char, __mbstate_t >',\n",
" 'class ::std::basic_filebuf< char, struct ::std::char_traits< char > >',\n",
" 'class ::std::locale::facet',\n",
" 'class ::std::locale::id',\n",
" 'class ::std::ctype< char >',\n",
" 'class ::std::ios_base',\n",
" 'class ::std::basic_istream< char, struct ::std::char_traits< char > >',\n",
" 'class ::std::basic_ifstream< char, struct ::std::char_traits< char > >',\n",
" 'class ::std::basic_ostream< char, struct ::std::char_traits< char > >',\n",
" 'class ::std::basic_ostringstream< char, struct ::std::char_traits< char >, class ::std::allocator< char > >',\n",
" 'class ::std::num_get< char, class ::std::istreambuf_iterator< char, struct ::std::char_traits< char > > >',\n",
" 'class ::std::basic_ios< char, struct ::std::char_traits< char > >',\n",
" 'class ::std::basic_stringbuf< char, struct ::std::char_traits< char >, class ::std::allocator< char > >',\n",
" 'class ::std::num_put< char, class ::std::ostreambuf_iterator< char, struct ::std::char_traits< char > > >']:\n",
" asg[noncopyable].is_copyable = False\n",
" for cls in asg.classes():\n",
" for fld in cls.fields(access='public'):\n",
" if fld.qualified_type.unqualified_type.globalname == 'class ::std::locale::id':\n",
" fld.boost_python_export = False\n",
" for specialization in asg['class ::std::reverse_iterator'].specializations():\n",
" specialization.boost_python_export = False\n",
" asg['::std::ios_base::openmode'].qualified_type.boost_python_export = True\n",
" return asg"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This `controller` is then dynamically registered and used on the ASG."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%time\n",
"autowig.controller['stat_tool'] = stat_tool_controller\n",
"autowig.controller.plugin = 'stat_tool'\n",
"asg = autowig.controller(asg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"In order to wrap the library we need to select the `boost_python_internal` `generator` implementation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%time\n",
"autowig.generator.plugin = 'boost_python_internal'\n",
"wrappers = autowig.generator(asg,\n",
" module = 'StructureAnalysis/stat_tool/src/py/_stat_tool.cpp',\n",
" decorator = 'StructureAnalysis/stat_tool/src/py/structure_analysis/stat_tool/_stat_tool.py',\n",
" prefix = 'wrapper_')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But we also require to the use the default `boost_python` `generator` implementation in order to wrap the `std::ostringstream` typedef."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#%%time\n",
"autowig.generator.plugin = 'boost_python'\n",
"wrappers = autowig.generator(asg,\n",
" [asg['::std::ostringstream'],\n",
" asg['::std::ios_base::openmode']],\n",
" module = wrappers.globalname,\n",
" prefix = 'wrapper_')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The wrappers are only generated in-memory.\n",
"It is therefore needed to write them on the disk to complete the process."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%time\n",
"wrappers.write()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is an example of the generated wrappers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!pygmentize StructureAnalysis/stat_tool/src/py/_stat_tool.cpp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to wrap a *C++* library, that will be used as a dependency by other libraries, the user needs to save the ASG resulting from the wrapping process.\n",
"We therefore use the **pickle** *Python* package for serializing the **StatTool** ASG in the `'StructureAnalysis/stat_tool/ASG.pkl'` file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pickle\n",
"with open('StructureAnalysis/stat_tool/ASG.pkl', 'w') as filehandler:\n",
" pickle.dump(asg, filehandler)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the wrappers are written on disk, we need to compile and install the *Python* bindings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"!conda build StructureAnalysis/stat_tool/conda/python-stat_tool -c statiskit -c conda-forge\n",
"!conda install python-stat_tool --use-local -c statiskit -c conda-forge"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using the *C++* library *Python* bindings\n",
"\n",
"Finally, we can hereafter use the *C++* library in the *Python* interpreter."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from structure_analysis import stat_tool\n",
"%reload_ext structure_analysis.stat_tool.mplotlib\n",
"%reload_ext structure_analysis.stat_tool.aml"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data (`meri0`) consists of the number of elongated organs of $424$ wild cherry tree shoots (*Prunus avium*).\n",
"These shoots were sampled in different architectural positions (from the trunk to peripheral positions of the trees) and were representative of the full range of growth potential.\n",
"The proximal part of a shoot always consists of preformed organs i.e. organs contained in the winter bud (preformed organs are differentiated at the end of the preceding growing season and elongated at the beginning of the current growing season).\n",
"This preformed part may be followed by a neoformed part consisting of organs differentiated and elongated during the current growing season."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"meri1 = stat_tool.Histogram(\"StructureAnalysis/stat_tool/share/data/meri1.his\")\n",
"meri2 = stat_tool.Histogram(\"StructureAnalysis/stat_tool/share/data/meri2.his\")\n",
"meri3 = stat_tool.Histogram(\"StructureAnalysis/stat_tool/share/data/meri3.his\")\n",
"meri4 = stat_tool.Histogram(\"StructureAnalysis/stat_tool/share/data/meri4.his\")\n",
"meri5 = stat_tool.Histogram(\"StructureAnalysis/stat_tool/share/data/meri5.his\")\n",
"meri0 = stat_tool.Merge(meri1, meri2, meri3, meri4, meri5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We estimated binomial mixture models on the basis of this data.\n",
"The number of components ($2$) was selected using the bayesian information criterion with models ranging from $1$ up to $4$ components."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"mixt = stat_tool.MixtureEstimation(meri0, 1, 4, \"BINOMIAL\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Further investigations can be performed in order to asses the quality of the mixture model with $2$ components.\n",
"For instance, we considered here the visualization of various probability functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mix.plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated herabove the overall fitness of the model is good and:\n",
"\n",
"* The first component corresponds to entirely preformed shoots.\n",
"* The second component to mixed shoots consisting of a preformed part followed by a neoformed part."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The **SequenceAnalysis** library\n",
"\n",
"### Wrapping the *C++* library\n",
"Once the wrapping of the **StatTool** library is performed, we can proceed to the actual generation of wrappers for the **SequenceAnalysis** *C++* library.\n",
"In order to wrap a *C++* library that has dependencies, the user need to combine the ASGs resulting from the wrapping of its dependencies before performing its own wrapping.\n",
"For this, we create an empty Abstract Semantic Graph (ASG)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"asg = autowig.AbstractSemanticGraph()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we use the **pickle** *Python* package for de-serializing the **StatTool** ASG and merge it in the current ASG."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"%%time\n",
"with open('StructureAnalysis/stat_tool/ASG.pkl', 'r') as filehandler:\n",
" asg.merge(pickle.load(filehandler))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we parse headers with relevant compilation flags."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%time\n",
"asg = autowig.parser(asg, (prefix/'include'/'sequence_analysis').walkfiles('*.h*'),\n",
" ['-x', 'c++', '-std=c++11', '-I' + str((prefix/'include').abspath())],\n",
" silent = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since most of **AutoWIG** guidelines are respected, the `default` `controller` implementation is suitable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%time\n",
"autowig.controller.plugin = 'default'\n",
"asg = autowig.controller(asg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to wrap the library we need to select the `boost_python_internal` `generator` implementation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%time\n",
"autowig.generator.plugin = 'boost_python_internal'\n",
"wrappers = autowig.generator(asg,\n",
" module = 'StructureAnalysis/sequence_analysis/src/py/_sequence_analysis.cpp',\n",
" decorator = 'StructureAnalysis/sequence_analysis/src/py/structure_analysis/sequence_analysis/_sequence_analysis.py',\n",
" prefix = 'wrapper_')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The wrappers are only generated in-memory.\n",
"It is therefore needed to write them on the disk to complete the process."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%time\n",
"wrappers.write()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is an example of the generated wrappers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!pygmentize StructureAnalysis/sequence_analysis/src/py/_sequence_analysis.cpp'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the wrappers are written on disk, we need to compile and install the *Python* bindings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!conda build StructureAnalysis/sequence_analysis/conda/python-sequence_analysis -c statiskit -c conda-forge\n",
"!conda install python-sequence_analysis --use-local -c statiskit -c conda-forge"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using the *C++* library *Python* bindings\n",
"\n",
"Finally, we can hereafter use the *C++* library in the *Python* interpreter."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from structure_analysis import sequence_analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data (`seq`) consist of $4050$ measurements of the nuclear-magnetic response of underground rocks.\n",
"The data were obtained by lowering a probe into a bore-hole.\n",
"Measurements were taken at discrete time points by the probe as it was lowered through the hole.\n",
"The underlying signal is roughly piecewise constant, with each constant segment relating to a single rock type that has constant physical properties.\n",
"The change points in the signal occur each time a new rock type is encountered.\n",
"Outliers were removed before the data were analyzed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"seq = sequence_analysis.Sequences(\"StructureAnalysis/sequence_analysis/share/data/well_log_filtered_indexed.seq\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We estimated Gaussian change in the mean and variance models on the basis of the well-log filtered data.\n",
"The number of segments ($16$) was selected using the slope heuristic with a slope estimated using log-likelihood of overparametrized models ranging from $30$ up to $80$ change points."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"seq.segmentation(0, 80, \"Gaussian\", min_nb_segment=30)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
echo OFF
echo "AutoWIG Examples"
echo "================"
echo ""
echo "This script will run the four examples presented in the"
echo "accompanying paper. Corresponding Jupyter notebooks can"
echo "also be run individually, echo using, the following"
echo "commands,"
echo ""
echo ON
jupyter nbconvert --to notebook --execute basic.ipynb --allow-errors --inplace --ExecutePreprocessor.timeout=3600 --NotebookApp.kernel_spec_manager_class='environment_kernels.EnvironmentKernelSpecManager'
jupyter nbconvert --to notebook --execute subset.ipynb --allow-errors --inplace --ExecutePreprocessor.timeout=3600 --NotebookApp.kernel_spec_manager_class='environment_kernels.EnvironmentKernelSpecManager'
jupyter nbconvert --to notebook --execute template.ipynb --allow-errors --inplace --ExecutePreprocessor.timeout=3600 --NotebookApp.kernel_spec_manager_class='environment_kernels.EnvironmentKernelSpecManager'
jupyter nbconvert --to notebook --execute dependent.ipynb --allow-errors --inplace --ExecutePreprocessor.timeout=3600 --NotebookApp.kernel_spec_manager_class='environment_kernels.EnvironmentKernelSpecManager'
echo OFF
echo "In order to visualize the Jupyter notebooks execute the "
echo "folowing command"
echo ""
echo "jupyter notebook index.ipynb"
set +x
set +e
echo "AutoWIG Examples"
echo "================"
echo ""
echo "This script will run the four examples presented in the"
echo "accompanying paper. Corresponding Jupyter notebooks can"
echo "also be run individually, echo using, the following"
echo "commands,"
echo ""
set -x
# git clone http://github.com/StatisKit/AutoWIG.git --depth=1
# mv AutoWIG/doc/examples/basic basic
# rm -rf AutoWIG
# jupyter nbconvert --to notebook --execute basic.ipynb --allow-errors --inplace --ExecutePreprocessor.timeout=3600 --NotebookApp.kernel_spec_manager_class='environment_kernels.EnvironmentKernelSpecManager'
# git clone http://github.com/StatisKit/PyClangLite.git --depth=1
# jupyter nbconvert --to notebook --execute subset.ipynb --allow-errors --inplace --ExecutePreprocessor.timeout=3600 --NotebookApp.kernel_spec_manager_class='environment_kernels.EnvironmentKernelSpecManager'
# rm -rf PySTL
# git clone http://github.com/StatisKit/PySTL.git --depth=1
# jupyter nbconvert --to notebook --execute template.ipynb --allow-errors --inplace --ExecutePreprocessor.timeout=3600 --NotebookApp.kernel_spec_manager_class='environment_kernels.EnvironmentKernelSpecManager'
git clone http://github.com/pfernique/StructureAnalysis.git
cd StructureAnalysis
git checkout -b remove_tool origin/remove_tool
cd ..
jupyter nbconvert --to notebook --execute dependent.ipynb --allow-errors --inplace --ExecutePreprocessor.timeout=3600 --NotebookApp.kernel_spec_manager_class='environment_kernels.EnvironmentKernelSpecManager'
set -e
set +x
echo "In order to visualize the Jupyter notebooks execute the "
echo "folowing command"
echo ""
echo "jupyter notebook index.ipynb"
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment