Last active
December 13, 2017 16:26
-
-
Save codez266/1bf7ca71442071c3f290e05b3a4f23ba to your computer and use it in GitHub Desktop.
Profiling of the _cross_score runs in revscoring when run with multilable random forests with WikiProjects labeled dataset.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Wed Dec 13 04:56:17 2017 stats [93/1765] | |
3844350339 function calls (3843797676 primitive calls) in 30453.783 seconds | |
Ordered by: cumulative time | |
List reduced from 260 to 50 due to restriction <50> | |
ncalls tottime percall cumtime percall filename:lineno(function) | |
1 0.000 0.000 30453.783 30453.783 /home/codezee/ai/venv/lib/python3.4/site-packages/revscoring-2.0.11-py3.4.egg/revscoring/scoring/models/model.py:209(cro$ | |
s_validate) | |
1 1.160 1.160 30453.733 30453.733 /home/codezee/ai/venv/lib/python3.4/site-packages/revscoring-2.0.11-py3.4.egg/revscoring/scoring/models/model.py:242(_cr$ | |
ss_score) | |
1 0.912 0.912 29308.131 29308.131 /home/codezee/ai/venv/lib/python3.4/site-packages/revscoring-2.0.11-py3.4.egg/revscoring/scoring/models/model.py:249(<li$ | |
tcomp>) | |
11148 171.564 0.015 29307.219 2.629 /home/codezee/ai/venv/lib/python3.4/site-packages/revscoring-2.0.11-py3.4.egg/revscoring/scoring/models/sklearn.py:159(sc$ | |
re) | |
33442 1665.965 0.050 29019.662 0.868 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/ensemble/forest.py:514(predict_proba) | |
33443 58.356 0.002 28459.225 0.851 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:759(__call__) | |
16754760 195.438 0.000 28307.637 0.002 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:596(dispatch_one_batch) | |
16721318 133.835 0.000 27057.859 0.002 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:554(_dispatch) | |
16721318 55.127 0.000 26841.160 0.002 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:177(__init__) | |
16721318 43.096 0.000 26786.033 0.002 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:71(__call__) | |
16721318 74.759 0.000 26742.937 0.002 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:72(<listcomp>) | |
16720818 74.964 0.000 25531.124 0.002 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/ensemble/forest.py:123(_parallel_helper) | |
16720818 17306.384 0.001 25434.040 0.002 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/tree/tree.py:648(predict_proba) | |
11148 92.672 0.008 9775.910 0.877 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/ensemble/forest.py:478(predict) | |
769191039 1151.734 0.000 7108.682 0.000 {method 'sum' of 'numpy.ndarray' objects} | |
769191039 655.695 0.000 5956.948 0.000 /home/codezee/ai/venv/lib/python3.4/site-packages/numpy/core/_methods.py:31(_sum) | |
769192039 5301.307 0.000 5301.307 0.000 {method 'reduce' of 'numpy.ufunc' objects} | |
1 0.038 0.038 1144.438 1144.438 /home/codezee/ai/venv/lib/python3.4/site-packages/revscoring-2.0.11-py3.4.egg/revscoring/scoring/models/sklearn.py:87(trai$ | |
) | |
1 0.006 0.006 1141.016 1141.016 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/ensemble/forest.py:185(fit) | |
500 0.177 0.000 1137.054 2.274 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/ensemble/forest.py:92(_parallel_build_trees) | |
500 16.404 0.033 1136.043 2.272 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/tree/tree.py:113(fit) | |
16754760 96.994 0.000 1046.251 0.000 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:67(__init__) | |
500 994.727 1.989 994.729 1.989 {method 'build' of 'sklearn.tree._tree.DepthFirstTreeBuilder' objects} | |
16754259 88.192 0.000 944.274 0.000 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/ensemble/forest.py:545(<genexpr>) | |
16721318 139.342 0.000 848.303 0.000 /home/codezee/ai/venv/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:144(delayed) | |
16720818 615.495 0.000 673.860 0.000 {method 'predict' of 'sklearn.tree._tree.Tree' objects} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The line that jumped out on our call is https://gist.github.com/codez266/1bf7ca71442071c3f290e05b3a4f23ba#file-cross_score_profile-L25, and it's a mystery what actually takes time in there. Note that the cumulative time of the numpy subroutines is less than half of the total time spent in tree.predict_proba.
However, I think the biggest culprit is simply the number of times we call dispatch_one_batch. 16M is a lot of times to call anything, is this expected? Or maybe we're accidentally turning something into NxN complexity?
Try printing n_jobs here, it seems to be enormnous: https://github.com/scikit-learn/scikit-learn/blob/0.17.1/sklearn/ensemble/forest.py#L540
How big is n_outputs_ here? https://github.com/scikit-learn/scikit-learn/blob/0.17.1/sklearn/tree/tree.py#L686