pmav99/overhead.md Secret

## overhead.md

      
    Raw
  

              overhead.md
            
          
    Introduction

I was trying to write a decorator/contextmanager that would temporary change the
computational region, but while using it I noticed that there was some overhead the root
of which seemed to be the usage of the GRASS modules. So in order to quantify this
I wrote a small benchmark that tries to measure the overhead of calling a GRASS Module.
This is what I found.
tl;dr

Calling a GRASS module incurs a constant but measurable overhead which, in certain
cases, e.g. when writing a module that uses a lot of the other modules, can quickly
add up to a significant quantity.
Disclaimer

If you try to run the benchmark on your own PC, the actual timings you will get will
probably be different. The differences might be caused by having:

a stronger/weaker CPU
faster/slower hard disk.
using different Python version
using different compilation flags

Still, I think that some of the findings are reproducible.
For reference, I used:

OS: Linux 5.0.9
CPU: Intel i7 6700HQ
Disk type: SSD
Python: 3.7
GCC: 8.3.0
CFLAGS='-O2 -fPIC -march=native -std=gnu99'

Demonstration

The easiest way to demonstrate the performance difference between using a GRASS module
vs using the GRASS API is to run the following snippets.
Both of them do exactly the same thing, i.e. retrieve the current region settings 10
times in a row.  The performance difference is huge though. On my laptop, the first one
needs 0.36 seconds while the second one needs just 0.00038 seconds. That's almost
a 1000x difference...
import time
import grass.script as gscript

start = time.time()
for i in range(10):
    region = gscript.parse_command("g.region", flags="g")
end = time.time()
total = end - start

print("Total time: %s" % total)
vs
import time
from grass.pygrass.gis.region import Region

start = time.time()
for i in range(10):
    region = Region()
end = time.time()
total = end - start

print("Total time: %s" % total)
How much is the overhead exactly?

In order to measure the actual overhead of calling a GRASS module, I created two new
GRASS modules that all they do is parse the command line arguments and measured how much
time is needed for their execution. The first module is r.simple and is
implemented in Python while the other one is r.simple.c and is implemented in C.
The timings are in msec and the benchmark was executed using Python 3.7


call method
r.simple
r.simple.c


pygrass.module.Module
85.9
66.5


pygrass.module.Module.shortcut
85.5
66.9


grass.script.run_command
41.3
30.5


subprocess.check_call
41.8
30.3


As we can see, gsrcipt.run_command and subprocess give more or less a identical
results, which is to be expected since run_command + friends are just a thin wrapper
around subprocess.  Similarly shortcuts has the same overhead as "vanila"
pygrass.Module.  Nevertheless, it is obvious that pygrass is roughly 2x times slower
than grass.script (but more about that later).
As far as C vs Python goes, on my computer modules implemented in C seem to be 25% faster than their
Python counterparts.  Nevertheless, a 40 msec startup time doesn't seem extraordinary
for a Python script, while 30 msec feels rather large for a CLI application implemented
in C.
Where is all that time being spent?

C Modules

Unfortunately, I am not familiar enough with the GRASS internals to easily check what is
going on and I didn't have the time to try to profile the code. I suspect that
G_gisinit or something similar is causing the overhead, but someone more familiar with
the C API should be able to enlighten us.
Python Modules

In order to gain a better understanding of the overhead we have when calling python
modules, we also need to measure the following quantities:

The time that python needs to spawn a new process
The startup time for the python interpreter
The time that python needs to import grass

These are the results:


msec


subprocess spawn
1.2


python 2 startup
9.0


python 2 + import grass
24.5


python 3 startup
18.2


python 3 + import grass
39.3


As we can see:

the overhead of spawning a new process from within python is more or less negligible
(at least compared to the other quantities).
The overhead of spawning a python 3 interpreter is 2x bigger than Python 2; i.e.  the
transition to Python 3 will have some performance impact, no matter what.
The overhead of spawning a python 3 interpreter accounts for roughly 50% of the total
overhead (18 msec out of 41 msec).
The other 50% of the overhead is pretty much caused just by importing the grass
library.

Why is Pygrass 2x times slower?

I haven't carefully looked into this (i.e. I haven't profiled the code), but it seems
that the culprit is this
line.
In other words, pygrass calls a module twice, once to get the interface's description
and once to actually run the module. That's why the overhead is double both for
C modules and Python ones.
Is this truly a problem?

Well, it depends :)
The most important factor is probably the size of the computational region.  If you are
dealing with really large regions, then the overhead is probably miniscule. If you are
dealing with much smaller ones though, then it can be significant.
That being said, there are at least two cases where I think that this overhead is
important:

interactive sessions
running tests

Just to give an example, i.pansharpen calls ~50 other GRASS modules. On my laptop, the
overhead for these calls is almost 2 seconds. Is this a lot? Well, if you are
pansharpening Landsat Tiles, it probably is not that much, but you will have this
overhead even if you are pansharpening a 4 pixel map (e.g. when running tests).
I am pretty sure that there are numerous other modules that are just like that, too.
What is to be done?

I think that there are two different areas where work can be done:

Reduce the actual overhead; i.e. make calling GRASS modules faster.
Convert calls to GRASS modules with API calls which, as shown earlier, are orders of
magnitude faster.

Reduce the overhead

The overhead of Python and C modules has different causes.
As I said, I haven't looked into what causes C modules to take so long so i can't make
any suggestions. This should be further looked into though. The reason is that modules
like e.g. g.region are being used practically everywhere, so speeding these up should
give a measureable performance improvement.
As far as Python modules go, unfortunately, the overhead seems to be relatively
inelastic.  Speeding up the startup time of the Python interpreter is not something that
GRASS can do or rely upon. So, the only thing that can be actually be done is to try to
reduce the time needed for importing the grass library. That being said, this should
probably not skim more than a few ms at best, but at least the gains should be there for
all python modules.
Luckily there is at least one relatively low hanging fruit. Speeding pygrass should not
be that difficult. I haven't looked into this but pre-generating the modules' XML
descriptions seems feasible and it should remove the need to call each module twice,
thus making pygrass performance comparable to grass.script.run_command
Convert module calls to API calls

Converting the module calls to API calls is what can potentially give the bigger
benefits, but at the same time is what needs the most work.
There are some low hanging fruits here too. E.g. functions like use_temp_region() and
raster_info() can probably be refactored to use the pygrass API, while calls to e.g.
g.region can probably be replaced with pygrass.gis.Region objects etc.
Nevertheless, this does not really touch the root of the problem which IMHV is the tight
coupling between the GRASS modules functionality and the CLI. In layman terms, at the
moment, if you want to use module A from module B you are forced to spawn a new process
and suffer the overhead that this entails.
TBH, I am not sure if this a problem that can even be tackled at this stage, but if each
module had one or more functions that could be imported/called by other modules,
everything would be much easier and performance would be significantly better.
How to run the benchmark?

If you want to run this benchmark on your own you need to:

git remote add panos https://github.com/pmav99/grass-ci.git
git fetch --all
git checkout overhead
cd scripts/r.simple && make && cd ../../
cd raster/r.simple.c && make && cd ../../
Start a grass session
python benchmark.py

I haven't run the benchmark under Python 2, but even if there are incompatibilities,
they should be trivial to fix.
call method	r.simple	r.simple.c
pygrass.module.Module	85.9	66.5
pygrass.module.Module.shortcut	85.5	66.9
grass.script.run_command	41.3	30.5
subprocess.check_call	41.8	30.3
	msec
subprocess spawn	1.2
python 2 startup	9.0
python 2 + import grass	24.5
python 3 startup	18.2
python 3 + import grass	39.3