Skip to content

Instantly share code, notes, and snippets.

@mathemage
Created September 4, 2017 11:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mathemage/1fc7c492072efdeb6e3f3b965b2e36b9 to your computer and use it in GitHub Desktop.
Save mathemage/1fc7c492072efdeb6e3f3b965b2e36b9 to your computer and use it in GitHub Desktop.
Failure of PCAWideDataSets with modularizing PCA implemenations
09-04 12:36:17.107 10.76.97.34:54321 23078 main INFO: pcaParameters._PCAImplementation: JAMA
09-04 12:36:17.109 10.76.97.34:54321 23078 main INFO: ParseSetup heuristic: cloudSize: 1, cores: 4, numCols: 13, maxLineLength: 74, totalSize: 3074, localParseSize: 3074, chunkSize: 4194304, numChunks: 1, numChunks * cols: 13
09-04 12:36:17.109 10.76.97.34:54321 23078 main INFO: Total file size: 3,0 KB
09-04 12:36:17.109 10.76.97.34:54321 23078 main INFO: Parse chunk size 4194304
09-04 12:36:17.111 10.76.97.34:54321 23078 FJ-1-3 INFO: Parse result for smalldata/pca_test/decathlon.csv (41 rows):
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: ColV2 type min max mean sigma NAs constant cardinality
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: 100m: numeric 10,4400 11,6400 10,9980 0,263023
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: Long.jump: numeric 6,61000 7,96000 7,26000 0,316402
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: Shot.put: numeric 12,6800 16,3600 14,4771 0,824428
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: High.jump: numeric 1,85000 2,15000 1,97683 0,0889505
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: 400m: numeric 46,8100 53,2000 49,6163 1,15345
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: 110m.hurdle: numeric 13,9700 15,6700 14,6059 0,471789
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: Discus: numeric 37,9200 51,6500 44,3256 3,37784
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: Pole.vault: numeric 4,20000 5,40000 4,76244 0,278000
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: Javeline: numeric 50,3100 70,5200 58,3166 4,82682
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: 1500m: numeric 262,100 317,000 279,025 11,6732
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: Rank: numeric 1,00000 28,0000 12,1220 7,91895
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: Points: numeric 7313,00 8893,00 8005,37 342,385
09-04 12:36:17.113 10.76.97.34:54321 23078 FJ-1-3 INFO: Competition: factor Decastar OlympicG 2
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: Chunk compression summary:
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: Chunk Type Chunk Name Count Count Percentage Size Size Percentage
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: CBS Binary 1 7,692 % 76 B 4,246 %
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: C1N 1-Byte Integers (w/o NAs) 1 7,692 % 109 B 6,089 %
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: C1S 1-Byte Fractions 5 38,462 % 625 B 34,916 %
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: C2 2-Byte Integers 1 7,692 % 150 B 8,380 %
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: C2S 2-Byte Fractions 5 38,462 % 830 B 46,369 %
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: Frame distribution summary:
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: Size Number of Rows Number of Chunks per Column Number of Chunks
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: 10.76.97.34:54321 1,7 KB 41 1 13
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: mean 1,7 KB 41,000000 1,000000 13,000000
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: min 1,7 KB 41,000000 1,000000 13,000000
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: max 1,7 KB 41,000000 1,000000 13,000000
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: stddev 0 B 0,000000 0,000000 0,000000
09-04 12:36:17.114 10.76.97.34:54321 23078 FJ-1-3 INFO: total 1,7 KB 41 1 13
09-04 12:36:17.115 10.76.97.34:54321 23078 main INFO: Data transformation applied is DESCALE
09-04 12:36:17.115 10.76.97.34:54321 23078 FJ-1-1 INFO: Building H2O PCA model with these parameters:
09-04 12:36:17.115 10.76.97.34:54321 23078 FJ-1-1 INFO: {"_train":{"name":"smalldata/pca_test/decathlon.csv","type":"Key"},"_valid":null,"_nfolds":0,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":12345,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":null,"_ignore_const_cols":true,"_weights_column":null,"_offset_column":null,"_fold_column":null,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":0.0,"_stopping_rounds":0,"_stopping_metric":"AUTO","_stopping_tolerance":0.001,"_response_column":null,"_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_transform":"DESCALE","_pca_method":"GramSVD","_k":3,"_max_iterations":1000,"_use_all_factor_levels":true,"_compute_metrics":true,"_impute_missing":false,"_PCAImplementation":"JAMA"}
09-04 12:36:17.116 10.76.97.34:54321 23078 FJ-1-1 WARN: _train: Dataset used may contain fewer number of rows due to removal of rows with NA/missing values. If this is not desirable, set impute_missing argument in pca call to TRUE/True/true/... depending on the client language.
09-04 12:36:17.117 10.76.97.34:54321 23078 FJ-1-1 INFO: Building H2O PCA model with these parameters:
09-04 12:36:17.117 10.76.97.34:54321 23078 FJ-1-1 INFO: {"_train":{"name":"smalldata/pca_test/decathlon.csv","type":"Key"},"_valid":null,"_nfolds":0,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":12345,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":null,"_ignore_const_cols":true,"_weights_column":null,"_offset_column":null,"_fold_column":null,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":0.0,"_stopping_rounds":0,"_stopping_metric":"AUTO","_stopping_tolerance":0.001,"_response_column":null,"_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_transform":"DESCALE","_pca_method":"GramSVD","_k":3,"_max_iterations":1000,"_use_all_factor_levels":true,"_compute_metrics":true,"_impute_missing":false,"_PCAImplementation":"JAMA"}
09-04 12:36:17.117 10.76.97.34:54321 23078 FJ-1-1 WARN: _train: Dataset used may contain fewer number of rows due to removal of rows with NA/missing values. If this is not desirable, set impute_missing argument in pca call to TRUE/True/true/... depending on the client language.
09-04 12:36:17.141 10.76.97.34:54321 23078 FJ-1-1 ERRR: rowNames.length == 10, pca._output._eigenvectors_raw.length == 41
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: java.lang.AssertionError
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at hex.pca.PCA$PCADriver.buildTables(PCA.java:146)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at hex.pca.PCA$PCADriver.computeStatsFillModel(PCA.java:248)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at hex.pca.PCA$PCADriver.computeImpl(PCA.java:359)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:173)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at water.H2O$H2OCountedCompleter.compute(H2O.java:1255)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
09-04 12:36:17.142 10.76.97.34:54321 23078 FJ-1-1 ERRR: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
09-04 12:36:17.143 10.76.97.34:54321 23078 main INFO: #### TEST hex.pca.PCAWideDataSetsTests#testWideDataSetGramSVD[3] EXECUTION TIME: 00:00:00.036 (Wall: 04-Sep 12:36:17.143)
java.lang.AssertionError
at hex.pca.PCA$PCADriver.buildTables(PCA.java:146)
at hex.pca.PCA$PCADriver.computeStatsFillModel(PCA.java:248)
at hex.pca.PCA$PCADriver.computeImpl(PCA.java:359)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:173)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1255)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment