View gist:e4e636ab60c0c631d8d0761433ce795c
16/11/21 14:21:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/11/21 14:21:39 INFO MemoryStore: MemoryStore cleared
16/11/21 14:21:39 INFO BlockManager: BlockManager stopped
16/11/21 14:21:39 INFO BlockManagerMaster: BlockManagerMaster stopped
16/11/21 14:21:39 WARN MetricsSystem: Stopping a MetricsSystem that is not running
16/11/21 14:21:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/11/21 14:21:39 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Could not parse Master URL: '10.0.2.12
10.0.2.10
10.0.2.9'
View gist:dc58fd991ac9043017af08dd8a17c512
16/11/21 14:22:01 INFO DAGScheduler: Job 2 finished: take at Wiki.scala:90, took 32.736133 s
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1089)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
View gist:743c71ffe411064a90e2ae14fccfce4b
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * com.google.guava:guava:(11.0.2, 16.0.1) -> 18.0
[warn] Run 'evicted' to see detailed eviction warnings
[info] Compiling 5 Scala sources to /private/tmp/Archive/code/target/scala-2.11/classes...
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
[info] Packaging /private/tmp/Archive/code/target/scala-2.11/project_2.11-1.0.jar ...
[info] Done packaging.
[success] Total time: 32 s, completed Nov 21, 2016 1:43:12 PM
View gist:2460185a5fb8498a77b7e742c2352fb1
ChristophersMBP:Spark-Wiki cmeiklejohn$ ~/Downloads/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class WikiCount --master local --packages com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 target/scala-2.11/wikicount_2.11-1.0.jar s3://cmeiklejohn-test-bucket $AWS_ACCESS_KEY_ID $AWS_SECRET_ACCESS_KEY
Ivy Default Cache set to: /Users/cmeiklejohn/.ivy2/cache
The jars for the packages stored in: /Users/cmeiklejohn/.ivy2/jars
:: loading settings :: url = jar:file:/Users/cmeiklejohn/Downloads/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.amazonaws#aws-java-sdk-pom added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.amazonaws#aws-java-sdk-pom;1.10.34 in central
found org.apache.hadoop#hadoop-aws;2.6.0 in local-m2-cache
View gist:6aeee878be045f4f05c58d98d42bbb32
16/11/19 17:50:21 INFO DAGScheduler: Job 0 finished: foreach at SimpleApp.scala:112, took 149.144955 s
1) Average: 2.996151184061866
2) Averages by languages:
en: 4.745498054726365
sp: 0.0
fr: 1.2194657511446376
pr: 0.0
it: 1.3628844266317683
3) Number of pages viewed more than 5: 359946
4) Percentage of pages for each language in languages
View gist:fccbefdbc949c874b0c9a8729121844b
16/11/19 17:39:01 INFO SparkContext: Created broadcast 0 from textFile at SimpleApp.scala:49
Creating pairs
Exception in thread "main" java.io.IOException: No FileSystem for scheme: s3a
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
View gist:8ce53d916662b387bc66706afb8d3754
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.commons#commons-math3;3.1.1: org.apache.commons#commons-math3;3.1.1!commons-math3.pom(pom.original) origin location must be absolute: file:/Users/cmeiklejohn/.m2/repository/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.pom
[warn] :: commons-net#commons-net;3.1: commons-net#commons-net;3.1!commons-net.pom(pom.original) origin location must be absolute: file:/Users/cmeiklejohn/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.pom
[warn] :: javax.servlet#servlet-api;2.5: javax.servlet#servlet-api;2.5!servlet-api.pom(pom.original) origin location must be absolute: file:/Users/cmeiklejohn/.m2/repository/javax/servlet/servlet-api/2.5/servlet-api-2.5.pom
[warn] :: org.mortbay.jetty#jetty;6.1.26: org.mortbay.jetty#jetty;6.1.26!jetty.pom(pom.original) origin location must be absolute: file:/Users/cmeiklejohn/.
View gist:cc3bf353593240c7ab16965e7d0548fd
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.amazonaws#aws-java-sdk-pom;1.10.34 in central
found org.apache.hadoop#hadoop-aws;2.6.0 in local-m2-cache
found org.apache.hadoop#hadoop-common;2.6.0 in local-m2-cache
found org.apache.hadoop#hadoop-annotations;2.6.0 in local-m2-cache
found com.google.guava#guava;11.0.2 in list
found com.google.code.findbugs#jsr305;1.3.9 in list
found commons-cli#commons-cli;1.2 in list
found org.apache.commons#commons-math3;3.1.1 in local-m2-cache
View gist:5447720783753b2b0dc754b723750a85
Ivy Default Cache set to: /Users/cmeiklejohn/.ivy2/cache
The jars for the packages stored in: /Users/cmeiklejohn/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/Cellar/apache-spark16/1.6.2/libexec/lib/spark-assembly-1.6.2-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.amazonaws#aws-java-sdk-pom added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.amazonaws#aws-java-sdk-pom;1.10.34 in central
found org.apache.hadoop#hadoop-aws;2.6.0 in local-m2-cache
found org.apache.hadoop#hadoop-common;2.6.0 in local-m2-cache
View gist:7fcfc1a94e0936b619d696f081875b8e
[info] Resolving jline#jline;2.12.1 ...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: com.fasterxml.jackson.core#jackson-databind;2.2.3: com.fasterxml.jackson.core#jackson-databind;2.2.3!jackson-databind.pom(pom.original) origin location must be absolute: file:/Users/cmeiklejohn/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.2.3/jackson-databind-2.2.3.pom
[warn] :: com.fasterxml.jackson.core#jackson-annotations;2.2.3: com.fasterxml.jackson.core#jackson-annotations;2.2.3!jackson-annotations.pom(pom.original) origin location must be absolute: file:/Users/cmeiklejohn/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.2.3/jackson-annotations-2.2.3.pom
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn] Note: Unresolved dependencies path:
[warn] com.fasterxml.jackson.core:jackson-databind:2.2.3