Skip to content

Instantly share code, notes, and snippets.

View lyubent's full-sized avatar

Lyuben Todorov lyubent

View GitHub Profile
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
%python
from pyspark.sql import Row
list = [("a", 318), ("b",32), ("c","im_gona_break_everything!")]
# // val dfFail = list.toDF()
rddWorks = spark.sparkContext.parallelize(list)
def func(a):
@lyubent
lyubent / gist:b6a76592306b45be0ac2d582ab05968f
Created December 7, 2017 09:37
spark_broadcasting.scala
scala> val df1 = sc.parallelize(List((1, "Jack"), (2, "Link"))).toDF("id", "name")
df1: org.apache.spark.sql.DataFrame = [id: int, name: string]
scala> val df2 = sc.parallelize(List((1, 41), (2, 29))).toDF("id", "age")
df2: org.apache.spark.sql.DataFrame = [id: int, age: int]
scala> import org.apache.spark.sql.functions.broadcast;
import org.apache.spark.sql.functions.broadcast
scala> val df3 = df1.join(broadcast(df2), Seq("id"))
vi /etc/profile
. /etc/profile
yum remove python27
yum groupinstall "Development tools"
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel
yum install gcc
cd /usr/src
wget https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz
{
"ClusterId": "j-EQPER71I4AQYL"
}
for CLUST_ID in `cat stuff.txt`;
do
# raka=${INSTANCE:18:15}
# echo $raka
if [ $CLUST_ID != { ] && [ $CLUST_ID != } ] && [ $CLUST_ID != \"ClusterId\"\: ];
Modify index.html so that when you click the "display stuff" button it will use javascript to:
- get the value of the first text input field and store it in a variable in javascript
- get the value of the second text input field and store it in a variable
- add the two variables together and display the result somewhere.
- the result can be displayed using alert(resultGoesHere) or you can create a new field in the html that will contain the value.
Example solution available here: http://bit.ly/2qi4d6L
Review the solution carefully and notice there are new id="input1" tags added to the input fields.
These id tags are used to uniquely identify elements within the webpage and must be unique.
Class elements can be duplicate however, we will use these later.
## python 2.7.x with gzip
yum remove python
yum groupinstall "Development tools"
cd /usr/src
wget http://python.org/ftp/python/2.7.6/Python-2.7.6.tgz
tar -xvzf Python-2.7.6.tgz
cd Python-2.7.6
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
scala> auctionsDF.groupBy("itemtype", "aucid")
res14: org.apache.spark.sql.GroupedData = org.apache.spark.sql.GroupedData@4c155287
scala> auctionsDF.groupBy("itemtype", "aucid").show()
<console>:38: error: value show is not a member of org.apache.spark.sql.GroupedData
auctionsDF.groupBy("itemtype", "aucid").show()
^
scala> auctionsDF.groupBy("itemtype", "aucid").count.show(5)
+--------+----------+-----+

#Eclipse Scala:

  1. Installed Scala IDE 4.4.1 (http://scala-ide.org/download/sdk.html)

  2. File > New Project > Maven Project

  3. Tick "Skip Archetype Selection" (simple project)

    Tich No Archetype

  4. Add maven project info

  5. Change to scala nature (Right click on proj > configure > add scala nature)

/* Simple app to inspect Auction data */
/* The following import statements are importing SparkContext, all subclasses and SparkConf*/
package exercises
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
// Will use max, min - import java.Lang.Math
import java.lang.Math
import scala.util.control.NonFatal