For those that are unfamiliar with the project, PyPy is an implementation of the Python language that features a JIT Compiler. I have noticed a huge performance benefit in some personal projects by switching to PyPy. I have always been curious how it would perform on a large and complex project like OpenStack, but my early experiments ran into massive roadblocks around broken dependencies.
It has been six months since I last looked, so I figured it was time to try it again. Support has come a long way and, now that lxml is working, we are close enough to get a Proof-of-Concept running. Read on for instructions on running nova with PyPy.
Start out with a base ubuntu 12.04 (precise) install and run devstack. I won't go through the details of getting devstack running here, because there are already instructions on the devstack site.
First we need to download pypy and unpack it:
wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.0-beta1-linux64-libc2.15.tar.bz2
tar -jxvf pypy-2.0-beta1-linux64-libc2.15.tar.bz2
For convenience, we put PyPy in the path:
sudo ln -s $PWD/pypy-2.0-beta1/bin/pypy /usr/bin/pypy
Next we need distribute and pip so we can install dependencies:
curl -O http://python-distribute.org/distribute_setup.py
curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
pypy distribute_setup.py
pypy get-pip.py
A few modifications to nova are needed to run all of the binaries with PyPy. I had to make four changes:
- Updated binaries to run with pypy instead of python
- Removed dependency on psycopg2
- Removed dependency on websockify
- Worked around lack of support for os.statvfs
Note that this is just a proof of concept, so I'm not overly worried about the lack of novnc support and improper disk usage reporting.
The above fixes can be snagged from the pypy branch on my github:
cd /opt/stack/nova
git remote add vishvananda https://github.com/vishvananda/nova.git
git fetch vishvananda
git checkout pypy
cd -
We can use pip to install nova's dependencies for PyPy:
./pypy-2.0-beta1/bin/pip install -r /opt/stack/nova/tools/pip-requires
./pypy-2.0-beta1/bin/pip install -r /opt/stack/nova/tools/test-requires
Eventlet has some issues with PyPy that need to be patched. First we will install a specific version so we can be sure that the patches will work:
./pypy-2.0-beta1/bin/pip install eventlet==0.12.1
There is an issue with corolocal. It appears that due to some difference in the PyPy implementation of WeakRef or the object model the check for the existence of a specific __init__
fails. Specifically thrl.__init__
can be None
and the attempt to call it doesn't work. We work around it by just making sure it isn't None
before calling it:
sed -i "s/if cls.__init__ is not object.__init__:/if cls.__init__ is not object.__init__ and thrl.__init__:/" pypy-2.0-beta1/site-packages/eventlet/corolocal.py
A more worrisome issue is that PyPy uses its own implementation of socket (it is similar to the Python3 implementation). Eventlet's green socket implementation isn't compatible and breaks down. PyPy's sockets have a method that eventlet's green sockets don't support called _decref_socketios
. This is supposed to be used to keep track of reference counts. Completely rewriting eventlet's socket class to be compatible with PyPy is more that I want to deal with for a POC, but the following minimal patch makes it work:
cat | patch pypy-2.0-beta1/site-packages/eventlet/greenio.py << EOF
# fix issue with eventlet socket
335a336
> self._closed = False
337a339,341
> def _decref_socketios(self):
> self.__del__()
>
374,378c378,384
< try:
< os.close(self._fileno)
< except:
< # os.close may fail if __init__ didn't complete (i.e file dscriptor passed to popen was invalid
< pass
---
> if not self._closed:
> try:
> os.close(self._fileno)
> self._closed = True
> except:
> # os.close may fail if __init__ didn't complete (i.e file dscriptor passed to popen was invalid
> pass
EOF
Libvirt can't be installed with pip so you will have to install it manually.
The first step is to create a short shell script called pypy-config. This tells the libvirt configure script where to find the PyPy include directory:
echo '#!/usr/bin/env bash' > $PWD/pypy-2.0-beta1/bin/pypy-config
echo "echo '-I$PWD/pypy-2.0-beta1/include/'" >> $PWD/pypy-2.0-beta1/bin/pypy-config
chmod 755 $PWD/pypy-2.0-beta1/bin/pypy-config
Next we download the libvirt dependencies and source:
sudo apt-get build-dep libvirt
apt-get source libvirt
Configure libvirt to build with PyPy:
cd libvirt-0.9.8
./configure --libdir=/usr/lib --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-python=`dirname $PWD`/pypy-2.0-beta1/bin/pypy
The lxc code seems to have trouble finding libnl. I suspect this could be fixed by passing --without-lxc
to configure above, but I manually fixed it in the Makefile:
sed -i 's/libvirt_lxc_CFLAGS = \\/libvirt_lxc_CFLAGS = $(LIBNL_CFLAGS) \\/' src/Makefile
The current version of PyPy doesn't support PyInstance_New. It looks like this patch adds support but it isn't in the beta, so we need to work around it by replacing the call with a call to PyEval_CallObject:
sed -i '1N;$!N;s/.*PyInstance_New.*\n.*\n.*/PyEval_CallObject(dom_class, pyobj_dom_args);/;P;D' python/libvirt-override.c
Now we should be able to build libvirt:
make
Devstack has already installed the libvirt binaries, so we just need the python libraries. Note that PyPy wants its .so files to have a .pypy-20.so extension, so we copy them in with the proper name and make them executable:
for a in `ls python/.libs/*.so`; do b=$(echo ../pypy-2.0-beta1/site-packages/$(basename $a) | sed s/.so/.pypy-20.so/); cp $a $b; chmod 755 $b; done
Next we copy the .py files:
cp python/libvirt.py ../pypy-2.0-beta1/site-packages/
cp python/libvirt_*.py ../pypy-2.0-beta1/site-packages/
Now that we've done all of the prep work, we can rerun the binaries using the PyPy versions:
killall screen
cd devstack
./rejoin_stack.sh
All of the normal nova commands should work with our shiny new PyPy install of nova.
Overall nova runs pretty well. Startup time is a bit slow, but that is expected with a JIT. I did get one crash in nova-conductor after leaving it running for a while:
RPython traceback:
File "pypy_jit_metainterp_compile.c", line 133, in force_now_1
File "pypy_jit_metainterp_compile.c", line 219, in ResumeGuardForcedDescr_save_data
Fatal RPython error: AssertionError
Aborted
This seems to be a crash in the PyPy compiler, which might be because we are running a beta version. I suspect it my hacky eventlet patching has exposed a bug, but perhaps some other library just isn't totally happy with PyPy yet.
UPDATE: Stackless support has issues in beta1 as mentioned on the PyPy site, so this crash should be gone by the time 2.0 final is out.
UPDATE: The current version of PyPy disables the JIT when eventlet is enabled. This is actively being worked on in a branch. I will detail my attempt to get the branch running in a future post.
Despite my experience with smaller projects, nova runs a bit more slowly with PyPy than with CPython. Specifically, simple API requests take about twice as long and complex operations like instance launch can take 33% longer (8 seconds in PyPy vs 6. seconds in CPython). I'm surprised; I expected a big speedup since a lot of our time is spent in python code (SQLALchemy I'm looking at you).
It is possible that the JIT benefits are being offset by slowdowns in c extensions. Or, perhaps eventlet's greenlet switches are preventing the JIT from optimizing effectively. It's hard to say without doing some in-depth profiling.
PyPy is on the cusp of being ready for real production applications like this. Based on my experience with other projects, I suspect that with some optimization effort we could see a performance benefit over the CPython version.
Some upstream fixes are clearly needed to eventlet, and PyPy might need a little longer to bake for stability reasons, but we should definitely keep our eye on PyPy as an option in the future.
hello, thank you very much for the write up! any updates on using recent pypy?