Skip to content

Instantly share code, notes, and snippets.

View vijayanandrp's full-sized avatar
👑

Vijay Anand Pandian vijayanandrp

👑
View GitHub Profile
@vijayanandrp
vijayanandrp / system_design_interview_notes_with_python.md
Created June 17, 2023 14:16
System Design - All in One Interview - Reading notes with python examples

Complete System Design Series

With examples and intelligible explanations…

Pic credits : Github

Welcome back peeps. We are now starting System Design Series ( over weekends) where we will cover how to design large ( and great) systems, the techniques, tip/tricks that you can refer to in order to scale these systems. As a senior software engineer it’s expected that you know not just the breadth but also depth of the system design concepts.

@vijayanandrp
vijayanandrp / kconnect.py
Last active January 7, 2023 01:01 — forked from rueedlinger/kconnect.py
Kafka Connect Python Script - https | user auth | Status | restart | pause | resume
# credits source : https://gist.github.com/rueedlinger/76af36d04a0798a8e1f43ed16595bd97
import sys
import os
import json
import argparse
from base64 import b64encode
PYTHON_MAJOR_VERSION = sys.version_info.major
DEFAULT_HOST = 'localhost'
@vijayanandrp
vijayanandrp / pyspark_explode_null.py
Created June 23, 2021 05:38
Pyspark - explode losing without null values
from pyspark.sql.functions import *
def flatten_df(nested_df):
flat_cols = [c[0] for c in nested_df.dtypes if c[1][:6] != 'struct']
nested_cols = [c[0] for c in nested_df.dtypes if c[1][:6] == 'struct']
flat_df = nested_df.select(flat_cols +
[col(nc + '.' + c).alias(nc + '_' + c)
for nc in nested_cols
for c in nested_df.select(nc + '.*').columns])
print("flatten_df_count :", flat_df.count())
# *-* coding: utf-8 *-*
import requests
try:
from pymongo import MongoClient
except ImportError:
raise ImportError('PyMongo is not installed')
try:
@vijayanandrp
vijayanandrp / Bigquery_util.py
Last active October 27, 2023 07:21
Big Query to Google Cloud storage
#!/usr/bin/env python
# Copyright 2016 Google Inc. All Rights Reserved.
import os
import sys
import time
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'BigQuery.json'
from google.cloud import bigquery
from google.cloud.bigquery.job import DestinationFormat, ExtractJobConfig, Compression
from google.cloud import bigquery
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'Google Analytics POC.json'
client = bigquery.Client()
query_job = client.query("""
SELECT
#!/usr/bin/env python3.5
# encoding: utf-8
import configparser
config = configparser.ConfigParser()
# I believe this config parser should use the perl autovivification method to create dynamic objects
config['DEFAULT'] = {
'Name': 'Vijay Anand',
[DEFAULT]
married = False
sex = M
name = Vijay Anand
age = 26
nationality = Indian
[www.facebook.com]
user_name = VjyAnnd

Tutorial Exercise: Yelp reviews (Solution)

Introduction

This exercise uses a small subset of the data from Kaggle's Yelp Business Rating Prediction competition.

Description of the data:

  • yelp.csv contains the dataset. It is stored in the repository (in the data directory), so there is no need to download anything from the Kaggle website.

Text SMS - Spam Classification Model

The base requirement of this project is to analyse the SMS dataset and come up with a machine learning models to predict or claissify the sms text. For getting my latest code and datasets please do visit my github.com account.

The following are the list of actions that we gonna do to solve this problem approach
  1. Reading a text-based dataset into pandas
  2. Vectorizing our dataset
  3. Building and evaluating a model