Last active
December 15, 2018 23:33
-
-
Save amitkumarj441/cbca3fd557db3eda8e64bc42c1894618 to your computer and use it in GitHub Desktop.
Spin H2O 🚀
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
````````````````````````````````````````````````````````` | |
ith@ith-ThinkPad-W520:~$ pip install h2o | |
Collecting h2o | |
Downloading https://files.pythonhosted.org/packages/6e/e4/1b34202b4887f8187f72acaa178eb4ff87982a9583008c78e1929d8a5e23/h2o-3.22.0.2.tar.gz (120.6MB) | |
100% |████████████████████████████████| 120.6MB 344kB/s | |
Requirement already satisfied: requests in ./anaconda3/lib/python3.6/site-packages (from h2o) (2.11.1) | |
Collecting tabulate (from h2o) | |
Downloading https://files.pythonhosted.org/packages/12/c2/11d6845db5edf1295bc08b2f488cf5937806586afe42936c3f34c097ebdc/tabulate-0.8.2.tar.gz (45kB) | |
100% |████████████████████████████████| 51kB 5.3MB/s | |
Requirement already satisfied: future in ./anaconda3/lib/python3.6/site-packages (from h2o) (0.16.0) | |
Requirement already satisfied: colorama>=0.3.8 in ./anaconda3/lib/python3.6/site-packages (from h2o) (0.3.9) | |
Building wheels for collected packages: h2o, tabulate | |
Running setup.py bdist_wheel for h2o ... done | |
Stored in directory: /home/ith/.cache/pip/wheels/0d/17/52/9ea300738f719aca7b88a790ce94b8c928e7c6098e72627c7f | |
Running setup.py bdist_wheel for tabulate ... done | |
Stored in directory: /home/ith/.cache/pip/wheels/2a/85/33/2f6da85d5f10614cbe5a625eab3b3aebfdf43e7b857f25f829 | |
Successfully built h2o tabulate | |
Installing collected packages: tabulate, h2o | |
Successfully installed h2o-3.22.0.2 tabulate-0.8.2 | |
ith@ith-ThinkPad-W520:~$ python | |
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) | |
[GCC 7.2.0] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> import h2o | |
>>> h2o.init() | |
Checking whether there is an H2O instance running at http://localhost:54321..... not found. | |
Attempting to start a local H2O server... | |
Java Version: openjdk version "1.8.0_191"; OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12); OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) | |
Starting server from /home/ith/anaconda3/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar | |
Ice root: /tmp/tmpj_ti7qqm | |
JVM stdout: /tmp/tmpj_ti7qqm/h2o_ith_started_from_python.out | |
JVM stderr: /tmp/tmpj_ti7qqm/h2o_ith_started_from_python.err | |
Server is running at http://127.0.0.1:54321 | |
Connecting to H2O server at http://127.0.0.1:54321... successful. | |
-------------------------- ---------------------------------------- | |
H2O cluster uptime: 01 secs | |
H2O cluster timezone: Asia/Kolkata | |
H2O data parsing timezone: UTC | |
H2O cluster version: 3.22.0.2 | |
H2O cluster version age: 23 days | |
H2O cluster name: H2O_from_python_ith_m9y0r2 | |
H2O cluster total nodes: 1 | |
H2O cluster free memory: 1.707 Gb | |
H2O cluster total cores: 8 | |
H2O cluster allowed cores: 8 | |
H2O cluster status: accepting new members, healthy | |
H2O connection url: http://127.0.0.1:54321 | |
H2O connection proxy: | |
H2O internal security: False | |
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4 | |
Python version: 3.6.4 final | |
-------------------------- ---------------------------------------- | |
>>> h2o.demo("glm") | |
------------------------------------------------------------------------------- | |
Demo of H2O's Generalized Linear Estimator. | |
This demo uploads a dataset to h2o, parses it, and shows a description. | |
Then it divides the dataset into training and test sets, builds a GLM | |
from the training set, and makes predictions for the test set. | |
Finally, default performance metrics are displayed. | |
------------------------------------------------------------------------------- | |
>>> # Connect to H2O | |
>>> h2o.init() | |
Checking whether there is an H2O instance running at http://localhost:54321. connected. | |
-------------------------- ---------------------------------------- | |
H2O cluster uptime: 1 min 13 secs | |
H2O cluster timezone: Asia/Kolkata | |
H2O data parsing timezone: UTC | |
H2O cluster version: 3.22.0.2 | |
H2O cluster version age: 23 days | |
H2O cluster name: H2O_from_python_ith_m9y0r2 | |
H2O cluster total nodes: 1 | |
H2O cluster free memory: 1.699 Gb | |
H2O cluster total cores: 8 | |
H2O cluster allowed cores: 8 | |
H2O cluster status: locked, healthy | |
H2O connection url: http://localhost:54321 | |
H2O connection proxy: | |
H2O internal security: False | |
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4 | |
Python version: 3.6.4 final | |
-------------------------- ---------------------------------------- | |
>>> # Upload the prostate dataset that comes included in the h2o python package | |
>>> prostate = h2o.load_dataset("prostate") | |
Parse progress: |█████████████████████████████████████████████████████████████████████████████| 100% | |
>>> # Print a description of the prostate data | |
>>> prostate.describe() | |
Rows:380 | |
Cols:9 | |
ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON | |
------- ------------------ ------------------ ----------------- ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ | |
type int int int int int int real real int | |
mins 1.0 0.0 43.0 0.0 1.0 1.0 0.3 0.0 0.0 | |
mean 190.5 0.4026315789473684 66.03947368421049 1.0868421052631572 2.2710526315789488 1.1078947368421048 15.408631578947375 15.812921052631573 6.3842105263157904 | |
maxs 380.0 1.0 79.0 2.0 4.0 2.0 139.7 97.6 9.0 | |
sigma 109.84079387914127 0.4910743389630552 6.527071269173311 0.3087732580252793 1.0001076181502861 0.3106564493514939 19.99757266856046 18.347619967271175 1.0919533744261092 | |
zeros 0 227 0 3 0 0 0 167 2 | |
missing 0 0 0 0 0 0 0 0 0 | |
0 1.0 0.0 65.0 1.0 2.0 1.0 1.4 0.0 6.0 | |
1 2.0 0.0 72.0 1.0 3.0 2.0 6.7 0.0 7.0 | |
2 3.0 0.0 70.0 1.0 1.0 2.0 4.9 0.0 6.0 | |
3 4.0 0.0 76.0 2.0 2.0 1.0 51.2 20.0 7.0 | |
4 5.0 0.0 69.0 1.0 1.0 1.0 12.3 55.9 6.0 | |
5 6.0 1.0 71.0 1.0 3.0 2.0 3.3 0.0 8.0 | |
6 7.0 0.0 68.0 2.0 4.0 2.0 31.9 0.0 7.0 | |
7 8.0 0.0 61.0 2.0 4.0 2.0 66.7 27.2 7.0 | |
8 9.0 0.0 69.0 1.0 1.0 1.0 3.9 24.0 7.0 | |
9 10.0 0.0 68.0 2.0 1.0 2.0 13.0 0.0 6.0 | |
>>> # Randomly split the dataset into ~70/30, training/test sets | |
>>> train, test = prostate.split_frame(ratios=[0.70]) | |
>>> # Convert the response columns to factors (for binary classification problems) | |
>>> train["CAPSULE"] = train["CAPSULE"].asfactor() | |
>>> test["CAPSULE"] = test["CAPSULE"].asfactor() | |
>>> # Build a (classification) GLM | |
>>> from h2o.estimators import H2OGeneralizedLinearEstimator | |
>>> prostate_glm = H2OGeneralizedLinearEstimator(family="binomial", alpha=[0.5]) | |
>>> prostate_glm.train(x=["AGE", "RACE", "PSA", "VOL", "GLEASON"], | |
... y="CAPSULE", training_frame=train) | |
glm Model Build progress: |███████████████████████████████████████████████████████████████████| 100% | |
>>> # Show the model | |
>>> prostate_glm.show() | |
Model Details | |
============= | |
H2OGeneralizedLinearEstimator : Generalized Linear Modeling | |
Model Key: GLM_model_python_1544916549296_1 | |
ModelMetricsBinomialGLM: glm | |
** Reported on train data. ** | |
MSE: 0.17549790843172788 | |
RMSE: 0.4189247049670476 | |
LogLoss: 0.5203121074548108 | |
Null degrees of freedom: 256 | |
Residual degrees of freedom: 251 | |
Null deviance: 346.08953451744003 | |
Residual deviance: 267.44042323177274 | |
AIC: 279.44042323177274 | |
AUC: 0.7985752111965704 | |
pr_auc: 0.743468386439608 | |
Gini: 0.5971504223931408 | |
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2834351554069239: | |
0 1 Error Rate | |
----- --- --- ------- ------------ | |
0 102 52 0.3377 (52.0/154.0) | |
1 20 83 0.1942 (20.0/103.0) | |
Total 122 135 0.2802 (72.0/257.0) | |
Maximum Metrics: Maximum metrics at their respective thresholds | |
metric threshold value idx | |
--------------------------- ----------- -------- ----- | |
max f1 0.283435 0.697479 134 | |
max f2 0.104783 0.798122 226 | |
max f0point5 0.548671 0.691906 69 | |
max accuracy 0.498722 0.743191 88 | |
max precision 0.998717 1 0 | |
max recall 0.0981563 1 233 | |
max specificity 0.998717 1 0 | |
max absolute_mcc 0.283435 0.45944 134 | |
max min_per_class_accuracy 0.420984 0.718447 116 | |
max mean_per_class_accuracy 0.283435 0.734081 134 | |
Gains/Lift Table: Avg response rate: 40.08 %, avg score: 40.08 % | |
group cumulative_data_fraction lower_threshold lift cumulative_lift response_rate score cumulative_response_rate cumulative_score capture_rate cumulative_capture_rate gain cumulative_gain | |
-- ------- -------------------------- ----------------- --------- ----------------- --------------- --------- -------------------------- ------------------ -------------- ------------------------- -------- ----------------- | |
1 0.0116732 0.985441 2.49515 2.49515 1 0.991876 1 0.991876 0.0291262 0.0291262 149.515 149.515 | |
2 0.0233463 0.958871 2.49515 2.49515 1 0.968557 1 0.980217 0.0291262 0.0582524 149.515 149.515 | |
3 0.0311284 0.938309 2.49515 2.49515 1 0.947873 1 0.972131 0.0194175 0.0776699 149.515 149.515 | |
4 0.0428016 0.925298 2.49515 2.49515 1 0.929792 1 0.960584 0.0291262 0.106796 149.515 149.515 | |
5 0.0505837 0.923301 2.49515 2.49515 1 0.924608 1 0.955049 0.0194175 0.126214 149.515 149.515 | |
6 0.101167 0.778681 2.30321 2.39918 0.923077 0.875317 0.961538 0.915183 0.116505 0.242718 130.321 139.918 | |
7 0.151751 0.70117 1.53547 2.11128 0.615385 0.733698 0.846154 0.854688 0.0776699 0.320388 53.5474 111.128 | |
8 0.202335 0.617197 1.53547 1.96733 0.615385 0.654417 0.788462 0.80462 0.0776699 0.398058 53.5474 96.7326 | |
9 0.299611 0.533276 1.39728 1.78225 0.56 0.565165 0.714286 0.726875 0.135922 0.533981 39.7282 78.2247 | |
10 0.400778 0.46393 1.24757 1.64728 0.5 0.498154 0.660194 0.66914 0.126214 0.660194 24.7573 64.7281 | |
11 0.501946 0.305072 1.15161 1.54738 0.461538 0.401419 0.620155 0.615181 0.116505 0.776699 15.1606 54.7377 | |
12 0.599222 0.247644 0.598835 1.39339 0.24 0.26779 0.558442 0.558786 0.0582524 0.834951 -40.1165 39.3393 | |
13 0.700389 0.228719 0.67177 1.28916 0.269231 0.23765 0.516667 0.5124 0.0679612 0.902913 -32.823 28.9159 | |
14 0.797665 0.190922 0.299417 1.16846 0.12 0.211326 0.468293 0.475683 0.0291262 0.932039 -70.0583 16.8458 | |
15 0.898833 0.0992135 0.575803 1.10175 0.230769 0.136606 0.441558 0.437519 0.0582524 0.990291 -42.4197 10.1753 | |
16 1 0.000538306 0.0959671 1 0.0384615 0.0743495 0.400778 0.400778 0.00970874 1 -90.4033 0 | |
Scoring History: | |
timestamp duration iterations negative_log_likelihood objective | |
-- ------------------- ---------- ------------ ------------------------- ----------- | |
2018-12-16 05:01:29 0.000 sec 0 173.045 0.673326 | |
2018-12-16 05:01:29 0.021 sec 1 137.684 0.536127 | |
2018-12-16 05:01:29 0.024 sec 2 133.889 0.521584 | |
2018-12-16 05:01:29 0.026 sec 3 133.722 0.521001 | |
2018-12-16 05:01:29 0.029 sec 4 133.72 0.520999 | |
>>> # Predict on the test set and show the first ten predictions | |
>>> predictions = prostate_glm.predict(test) | |
>>> predictions.show() | |
glm prediction progress: |████████████████████████████████████████████████████████████████████| 100% | |
predict p0 p1 | |
--------- -------- -------- | |
1 0.495329 0.504671 | |
1 0.35433 0.64567 | |
1 0.16257 0.83743 | |
1 0.570585 0.429415 | |
1 0.368692 0.631308 | |
0 0.742835 0.257165 | |
1 0.505277 0.494723 | |
1 0.198034 0.801966 | |
0 0.767926 0.232074 | |
0 0.73679 0.26321 | |
[123 rows x 3 columns] | |
>>> # Show default performance metrics | |
>>> performance = prostate_glm.model_performance(test) | |
>>> performance.show() | |
ModelMetricsBinomialGLM: glm | |
** Reported on test data. ** | |
MSE: 0.1845816027455374 | |
RMSE: 0.4296296111135002 | |
LogLoss: 0.5399893841613397 | |
Null degrees of freedom: 122 | |
Residual degrees of freedom: 117 | |
Null deviance: 166.2047381088705 | |
Residual deviance: 132.83738850368957 | |
AIC: 144.83738850368957 | |
AUC: 0.7964383561643835 | |
pr_auc: 0.7001994986434553 | |
Gini: 0.592876712328767 | |
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.23779643521579233: | |
0 1 Error Rate | |
----- --- --- ------- ------------ | |
0 40 33 0.4521 (33.0/73.0) | |
1 4 46 0.08 (4.0/50.0) | |
Total 44 79 0.3008 (37.0/123.0) | |
Maximum Metrics: Maximum metrics at their respective thresholds | |
metric threshold value idx | |
--------------------------- ----------- -------- ----- | |
max f1 0.237796 0.713178 78 | |
max f2 0.180305 0.830508 94 | |
max f0point5 0.476899 0.68 49 | |
max accuracy 0.476899 0.739837 49 | |
max precision 0.992726 1 0 | |
max recall 0.0958891 1 108 | |
max specificity 0.992726 1 0 | |
max absolute_mcc 0.237796 0.479514 78 | |
max min_per_class_accuracy 0.436354 0.739726 55 | |
max mean_per_class_accuracy 0.436354 0.739863 55 | |
Gains/Lift Table: Avg response rate: 40.65 %, avg score: 40.55 % | |
group cumulative_data_fraction lower_threshold lift cumulative_lift response_rate score cumulative_response_rate cumulative_score capture_rate cumulative_capture_rate gain cumulative_gain | |
-- ------- -------------------------- ----------------- ------- ----------------- --------------- --------- -------------------------- ------------------ -------------- ------------------------- ------- ----------------- | |
1 0.0162602 0.989402 2.46 2.46 1 0.992642 1 0.992642 0.04 0.04 146 146 | |
2 0.0243902 0.972377 2.46 2.46 1 0.978209 1 0.987831 0.02 0.06 146 146 | |
3 0.0325203 0.956311 2.46 2.46 1 0.964953 1 0.982112 0.02 0.08 146 146 | |
4 0.0406504 0.950283 2.46 2.46 1 0.951859 1 0.976061 0.02 0.1 146 146 | |
5 0.0569106 0.943331 2.46 2.46 1 0.947674 1 0.967951 0.04 0.14 146 146 | |
6 0.105691 0.833227 1.64 2.08154 0.666667 0.880921 0.846154 0.927783 0.08 0.22 64 108.154 | |
7 0.154472 0.75037 1.23 1.81263 0.5 0.782443 0.736842 0.881886 0.06 0.28 23 81.2632 | |
8 0.203252 0.631417 0.82 1.5744 0.333333 0.688214 0.64 0.835405 0.04 0.32 -18 57.44 | |
9 0.300813 0.528942 1.845 1.66216 0.75 0.571819 0.675676 0.749918 0.18 0.5 84.5 66.2162 | |
10 0.398374 0.476904 1.64 1.65673 0.666667 0.501955 0.673469 0.689192 0.16 0.66 64 65.6735 | |
11 0.504065 0.293807 1.13538 1.54742 0.461538 0.406299 0.629032 0.629876 0.12 0.78 13.5385 54.7419 | |
12 0.601626 0.253514 0.82 1.42946 0.333333 0.266497 0.581081 0.570949 0.08 0.86 -18 42.9459 | |
13 0.699187 0.227448 0.82 1.34442 0.333333 0.237395 0.546512 0.524407 0.08 0.94 -18 34.4419 | |
14 0.796748 0.161407 0.41 1.23 0.166667 0.192631 0.5 0.483781 0.04 0.98 -59 23 | |
15 0.894309 0.0951997 0.205 1.11818 0.0833333 0.119999 0.454545 0.444096 0.02 1 -79.5 11.8182 | |
16 1 0.0550237 0 1 0 0.0787136 0.406504 0.405478 0 1 -100 0 | |
---- End of Demo ---- | |
````````````````````````````````````````````````````` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment