Skip to content

Instantly share code, notes, and snippets.

View SuvroBaner's full-sized avatar
Working from home

Suvro Banerjee SuvroBaner

Working from home
View GitHub Profile
SuvroBaner /
Last active June 21, 2018 08:34
This code will count the number of words in a textfile using Spark
# Creating Spark Configuration and Spark Context-
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("Word Counter")
sc = SparkContext(conf = conf)
# Reading the file-
myTextFile = sc.textFile("/Users/bsuvro/spark-2.3.0-bin-hadoop2.7/")
# Removing the empty lines-
non_emptyLines = myTextFile.filter(lambda line: len(line) > 0)
# Creating Spark Configuration and Spark Context-
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("My Dataframe")
sc = SparkContext(conf = conf)
from pyspark.sql import SparkSession # To work with dataframe we need pyspark.sql
spark = SparkSession(sc) # passing Spark Context to SQL module
myRange = spark.range(1000).toDF("number")
alist = ['h', 'e', 'a', 'd']
def insertionSort(alist):
for i in range(1, len(alist)): # it starts from position 1 , i.e. "e" and goes till end od the string "d"
j = i
current_value = alist[i]
print("Iteration: ", i)
while (j > 0 and current_value < alist[j-1]):
alist[j] = alist[j-1]
j = j - 1
article = '''
Asian shares skidded on Tuesday after a rout in tech stocks put Wall Street to the sword, while a
sharp drop in oil prices and political risks in Europe pushed the dollar to 16-month highs as investors dumped
riskier assets. MSCI’s broadest index of Asia-Pacific shares outside Japan dropped 1.7 percent to a 1-1/2
week trough, with Australian shares sinking 1.6 percent. Japan’s Nikkei dived 3.1 percent led by losses in
electric machinery makers and suppliers of Apple’s iphone parts. Sterling fell to $1.286 after three straight
sessions of losses took it to the lowest since Nov.1 as there were still considerable unresolved issues with the
European Union over Brexit, British Prime Minister Theresa May said on Monday.'''
import nltk
import pandas as pd
url_df = pd.read_csv('Book2.csv')
def fn_split_url(x):
list_of_tokens = x.split('/')
bad_words = ['https:', 'http:', '', 'en_US', '']
final_tokens = []
s = ''
for token in list_of_tokens:
if token not in bad_words:
article = '''
Asian shares skidded on Tuesday after a rout in tech stocks put Wall Street to the sword, while a
sharp drop in oil prices and political risks in Europe pushed the dollar to 16-month highs as investors dumped
riskier assets. MSCI’s broadest index of Asia-Pacific shares outside Japan dropped 1.7 percent to a 1-1/2
week trough, with Australian shares sinking 1.6 percent. Japan’s Nikkei dived 3.1 percent led by losses in
electric machinery makers and suppliers of Apple’s iphone parts. Sterling fell to $1.286 after three straight
sessions of losses took it to the lowest since Nov.1 as there were still considerable unresolved issues with the
European Union over Brexit, British Prime Minister Theresa May said on Monday.'''
import spacy
article = '''
Asian shares skidded on Tuesday after a rout in tech stocks put Wall Street to the sword, while a
sharp drop in oil prices and political risks in Europe pushed the dollar to 16-month highs as investors dumped
riskier assets. MSCI’s broadest index of Asia-Pacific shares outside Japan dropped 1.7 percent to a 1-1/2
week trough, with Australian shares sinking 1.6 percent. Japan’s Nikkei dived 3.1 percent led by losses in
electric machinery makers and suppliers of Apple’s iphone parts. Sterling fell to $1.286 after three straight
sessions of losses took it to the lowest since Nov.1 as there were still considerable unresolved issues with the
European Union over Brexit, British Prime Minister Theresa May said on Monday.'''
import nltk
books = ['The Lord of the Rings', 'The Hobbit', 'Harry Potter and the Chamber of Secrets',
'Black Beauty', 'Kane and Abel', 'To Kill a Mocking Bird', 'Gitanjali',
'Spring and Autumn Annals', 'Rumi Poems'] # defining a list of books
book_catalog = {} # defining a dictionary
for book in books:
book_index = book[0] # extract the first letter (for catalog)
if book_index not in book_catalog:
book_catalog[book_index] = [book]
def model(input_shape):
# Define the input placeholder as a tensor with shape input_shape. Think of this as your input image!
X_input = Input(input_shape)
# Zero-Padding: pads the border of X_input with zeroes
X = ZeroPadding2D((3, 3))(X_input)
# CONV -> BN -> RELU Block applied to X
X = Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0')(X)
X = BatchNormalization(axis = 3, name = 'bn0')(X)
def function1(n):
a = 0
for (i = 0; i < n; i += 1):
for (j = 0; j < n; j += 1):
a += 1
return a
def function2(n):
a = 0
for (i = 0; i < n; i += 1):