This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql import SparkSession | |
from pyspark.sql.functions import * | |
from pyspark.sql import Row | |
from pyspark.sql.types import IntegerType | |
# Create the Spark session | |
spark = SparkSession.builder \ | |
.master("local") \ | |
.config("spark.sql.autoBroadcastJoinThreshold", -1) \ | |
.config("spark.executor.memory", "500mb") \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.ml.Pipeline | |
import org.apache.spark.ml.PipelineStage | |
import org.apache.spark.ml.Transformer | |
import org.apache.spark.ml.classification.LogisticRegression | |
import org.apache.spark.ml.feature.LabeledPoint | |
import org.apache.spark.ml.linalg.DenseVector | |
import org.apache.spark.ml.linalg.Vectors | |
import org.apache.spark.ml.param.ParamMap | |
import org.apache.spark.sql.Dataset | |
import org.apache.spark.sql.Row |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.diorsding.spark.ml; | |
import java.util.Arrays; | |
import java.util.List; | |
import org.apache.spark.SparkConf; | |
import org.apache.spark.SparkContext; | |
import org.apache.spark.ml.Pipeline; | |
import org.apache.spark.ml.PipelineModel; | |
import org.apache.spark.ml.PipelineStage; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import java.util.Arrays; | |
import java.util.List; | |
import org.apache.hadoop.yarn.webapp.hamlet.HamletSpec.P; | |
import org.apache.spark.SparkConf; | |
import org.apache.spark.api.java.JavaSparkContext; | |
import org.apache.spark.api.java.function.MapFunction; | |
import org.apache.spark.ml.Pipeline; | |
import org.apache.spark.ml.PipelineModel; | |
import org.apache.spark.ml.PipelineStage; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.sql.functions._ | |
import org.apache.spark.sql.SparkSession | |
object DataFrameWithFileNameApp extends App { | |
val spark: SparkSession = | |
SparkSession | |
.builder() | |
.appName("DataFrameApp") | |
.config("spark.master", "local[*]") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Custom history configuration | |
# Run script using: | |
# chmod u+x better_history.sh | |
# sudo su | |
# ./better_history.sh | |
echo ">>> Starting" | |
echo ">>> Loading configuration into /etc/bash.bashrc" | |
echo "HISTTIMEFORMAT='%F %T '" >> /etc/bash.bashrc | |
echo 'HISTFILESIZE=-1' >> /etc/bash.bashrc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[alias] | |
co = checkout | |
cob = checkout -b | |
coo = !git fetch && git checkout | |
br = branch | |
brd = branch -d | |
brD = branch -D | |
merged = branch --merged | |
dmerged = "git branch --merged | grep -v '\\*' | xargs -n 1 git branch -d" | |
st = status |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
set -e | |
########################################################## | |
# Install script for Docker-CE on ElementaryOS 0.4.1 Loki | |
# Had to update the repository to point to xenial instead | |
# of using 'lsb_release -cs' because there's no loki | |
# repository at download.docker.com. | |
########################################################## |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes hive ql scripts to create an external partitioned table for Syslog | |
generated log files using regex serde; | |
Usecase: Count the number of occurances of processes that got logged, by year, month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data download: 02-DataDownload | |
Data load commands: 03-DataLoadCommands |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.databricks.spark.jira | |
import scala.io.Source | |
import org.apache.spark.rdd.RDD | |
import org.apache.spark.sql._ | |
import org.apache.spark.sql.functions._ | |
import org.apache.spark.sql.sources.{TableScan, BaseRelation, RelationProvider} |
NewerOlder