Start with cProfile to get a good overview of what's slow.
$ python -m cProfile -s cumulative your_code.py
-s to order by cumulative time
Example output:
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.034 0.034 19.664 19.664 julia1_nopil.py:1(<module>)
1 0.843 0.843 19.630 19.630 julia1_nopil.py:23(calc_pure_python)
1 14.121 14.121 18.627 18.627 julia1_nopil.py:9
(calculate_z_serial_purepython)
34219980 4.487 0.000 4.487 0.000 {abs}
2002000 0.150 0.000 0.150 0.000 {method 'append' of 'list' objects}
1 0.019 0.019 0.019 0.019 {range}
1 0.010 0.010 0.010 0.010 {sum}
2 0.000 0.000 0.000 0.000 {time.time}
4 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Where tottime
is the amount of time spent on the function's lines of code, time not spent in other functions, and cumtime
is both.
Now that we have a rough idea that calculate_z_serial_purepython
is slow we can dig in by using line_profiler
.
$ pip install line_profiler
This requires you to wrap the offending function in the decorator @profile
. Example code:
@profile
def calculate_z_serial_pure_python(max_iter, zs, cs):
"""Calculate output list using Julia update rules"""
output = [0] * len(zs)
for i in range(len(zs)): # range used for demo purposes
n = 0
z = zs[i]
c = cs[i]
while abs(z) < 2 and n < max_iter:
z = z * z + c
n += 1
output[i] = n
return output
To run:
$ kernprof -l -v your_code.py
The arguments are -l
for line-by-line profiling and -v
for verbose output.
Here's the output:
Wrote profile results to your_code.py.lprof
Timer unit: 1e-06 s
File: julia1_lineprofiler.py
Function: calculate_z_serial_purepython at line 9
Total time: 100.81 s
Line # Hits Per Hit % Time Line Contents
==================================================
9 @profile
10 def calculate_z_serial_purepython(maxiter, zs, cs):
11 """Calculate output list using
Julia update rule"""
12 1 6870.0 0.0 output = [0] * len(zs)
13 1000001 0.8 0.8 for i in range(len(zs)):
14 1000000 0.8 0.8 n = 0
15 1000000 0.8 0.8 z = zs[i]
16 1000000 0.8 0.8 c = cs[i]
17 34219980 1.1 36.2 while abs(z) < 2 and n < maxiter:
18 33219980 1.0 32.6 z = z * z + c
19 33219980 0.8 27.2 n += 1
20 1000000 0.9 0.9 output[i] = n
21 1 4.0 0.0 return output
We can see that 36.2% of the time is spent in the conditional on line 17. We can use iPython's %timeit
to see if we can shortcircuit things.
%timeit abs(z) < 2
%timeit n < max_iter