Skip to content

Instantly share code, notes, and snippets.

package rnd
import kafka.serializer.StringDecoder
import org.apache.spark.sql.SQLContext
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Minutes, Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
object KafkaSparkStreamingToES {
df = sc.parallelize([(1, 'Y','F',"Giri",'Y'), (2, 'N','V',"Databricks",'N'),(3,'Y','B',"SparkEdge",'Y'),(4,'N','X',"Spark",'N')]).toDF(["id", "flag1","flag2","name","flag3"])
print 'Show Dataframe'
df.show()
print 'Actual Schema of the df'
df.printSchema()
for a_dftype in df.dtypes:
col_name = a_dftype[0]
col_type = a_dftype[1]
# print df.select(col_name).collect()[0][0]
//
// Spark 2.0 to SQL Server via External Data Source API and SQL JDBC
//
// References:
// - https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
// - https://blogs.msdn.microsoft.com/bigdatasupport/2015/10/22/how-to-allow-spark-to-access-microsoft-sql-server/
// - https://docs.microsoft.com/en-us/sql/connect/jdbc/using-the-jdbc-driver
// Run spark-shell
// - Get the SQL Server JDBC JAR fom the above "Using the JDBC driver" link
@CsBigDataHub
CsBigDataHub / Installation.md
Created April 25, 2018 14:52 — forked from albertbori/Installation.md
Automatically disable Wifi when an Ethernet connection (cable) is plugged in on a Mac

Overview

This is a bash script that will automatically turn your wifi off if you connect your computer to an ethernet connection and turn wifi back on when you unplug your ethernet cable/adapter. If you decide to turn wifi on for whatever reason, it will remember that choice. This was improvised from this mac hint to work with Yosemite, and without hard-coding the adapter names. It's supposed to support growl, but I didn't check that part. I did, however, add OSX notification center support. Feel free to fork and fix any issues you encounter.

Most the credit for these changes go to Dave Holland.

Requirements

  • Mac OSX 10+
  • Administrator privileges
@CsBigDataHub
CsBigDataHub / .gitconfig
Created June 13, 2018 16:37 — forked from rambabusaravanan/.gitconfig
Git Diff and Merge Tool - IntelliJ IDEA
# Linux
# add the following to "~/.gitconfig" file
[merge]
tool = intellij
[mergetool "intellij"]
cmd = /usr/local/bin/idea merge $(cd $(dirname "$LOCAL") && pwd)/$(basename "$LOCAL") $(cd $(dirname "$REMOTE") && pwd)/$(basename "$REMOTE") $(cd $(dirname "$BASE") && pwd)/$(basename "$BASE") $(cd $(dirname "$MERGED") && pwd)/$(basename "$MERGED")
trustExitCode = true
[diff]
@CsBigDataHub
CsBigDataHub / .gitconfig
Created June 13, 2018 16:38 — forked from samsalisbury/.gitconfig
Git diff and merge with p4merge (OSX)
[merge]
keepBackup = false
tool = p4merge
[mergetool "p4merge"]
cmd = /Applications/p4merge.app/Contents/Resources/launchp4merge "\"$PWD/$BASE\"" "\"$PWD/$REMOTE\"" "\"$PWD/$LOCAL\"" "\"$PWD/$MERGED\""
keepTemporaries = false
trustExitCode = false
keepBackup = false
[diff]
tool = p4merge
@CsBigDataHub
CsBigDataHub / p4merge4git.md
Created June 13, 2018 16:38 — forked from tony4d/p4merge4git.md
Setup p4merge as a visual diff and merge tool for git

Meld for OS X

This README should help you build Meld for OS X.

💡Tip: A lot of people are asking how to use this package as a git difftool. Once installed, edit your ~/.gitconfig, and add the following lines

[diff]

tool = meld

@CsBigDataHub
CsBigDataHub / sed cheatsheet
Created July 31, 2018 12:52 — forked from un33k/sed cheatsheet
magic of sed -- find and replace "text" in a string or a file
FILE SPACING:
# double space a file
sed G
# double space a file which already has blank lines in it. Output file
# should contain no more than one blank line between lines of text.
sed '/^$/d;G'
@CsBigDataHub
CsBigDataHub / am_script2.py
Created September 3, 2018 13:09 — forked from mhulse/am_script2.py
Simple Python accessors (@getter/@Setter and @deleter) example/test using mixin and decorators... I'm using Python 2.6 for testing.
import pprint
# pprint.pprint(dir(obj))
# pprint.pprint(list)
# $ python manage.py runscript am_script2
class BaseMixin(object):
#----------------------------------
# Init: