Skip to content

Instantly share code, notes, and snippets.

View Renien's full-sized avatar
🏠
Working from home

Renien John Joseph Renien

🏠
Working from home
View GitHub Profile
@Renien
Renien / Kubectl-Cheat-Sheet.md
Last active June 7, 2022 07:47
Kubectl Cheat Sheet

Cheat Sheet

Details of the pods

$kubectl get pods -o wide

$kubectl get pods

Get the replication controller

$kubectl get rc

@Renien
Renien / hashingTest.scala
Created June 18, 2018 04:31
Clarify the doubts about hasing function.
import scala.util.hashing.MurmurHash3
def md3(s: String) = {
MurmurHash3.stringHash(s).toString
}
val hasehd = List(md3("1"), md3("2"), md3("3"), md3("Product:12345"), md3("Product:12346"), md3("Product:12347"))
val unhashed = List("1", "2", "Product:12345", "Product:12346", "Product:12347", "3")
@Renien
Renien / n-gram-scala.scala
Created May 25, 2016 15:52
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
object NGram {
/**
* Split the sentence
* @param data documents
* @param splitter the delimiting regular expression
* @return the array of strings computed by splitting this string
* around matches of the given regular expression
*/
private def split(data: String, splitter: String): Seq[String] ={
@Renien
Renien / a-priori-python.py
Created May 15, 2016 13:40
We can find all frequent pairs by making two passes over the baskets. On the first pass, we count the items themselves, and then determine which items are frequent. On the second pass, we count only the pairs of items both of which are found frequently on the first pass. Monotonicity justifies our ignoring other pairs.
__author__ = 'renienj'
import urllib
from itertools import combinations
import time
from collections import defaultdict
"""
Sample Data Sets
----------------
@Renien
Renien / k-shingling-python.py
Created May 14, 2016 14:10
A k-shingle is any k characters that appear consecutively in a document. If we represent a document by its set of k-shingles, then the Jaccard similarity of the shingle sets measures the textual similarity of documents. Sometimes, it is useful to hash shingles to bit strings of shorter length, and use sets of hash values to represent documents.
__author__ = 'renienj'
def compute_gram(doc_data, k=2):
"""
In natural language processing a w-shingling is a set of unique "shingles"
(n-grams, contiguous subsequences of tokens in a document)
Very much similar to n-grams but here we consider characters
"""
@Renien
Renien / tf-ids-python.py
Last active May 14, 2016 13:11
The measure called TF.IDF lets us identify words in a collection of documents that are useful for determining the topic of each document. A word has high TF.IDF score in a document if it appears in relatively few documents, but appears in this one, and when it appears in a document it tends to appear many times.
__author__ = 'renienj'
import numpy as np
import pandas as pd
import math as math
def compute_tfidf(tf_list, idf_list):
"""
tfidf = tf(w) * idf(w)
@Renien
Renien / jaccard-similarity-python.py
Last active May 27, 2021 19:31
Jaccard Similarity: The Jaccard similarity of sets is the ratio of the size of the intersection of the sets to the size of the union. This measure of similarity is suitable for many applications, including textual similarity of documents and similarity of buying habits of customers.
__author__ = 'renienj'
import numpy as np
def compute_jaccard_similarity_score(x, y):
"""
Jaccard Similarity J (A,B) = | Intersection (A,B) | /
| Union (A,B) |
"""
intersection_cardinality = len(set(x).intersection(set(y)))
@Renien
Renien / index.html
Created March 6, 2016 12:04
vis.js sample code for POS Tags visualization
<!doctype html>
<html>
<head>
<title>Network | Basic usage</title>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0-beta1/jquery.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/vis/4.15.0/vis.js"></script>
<script type="text/javascript" src="pos-pattern.js"></script>
<link href="https://cdnjs.cloudflare.com/ajax/libs/vis/4.15.0/vis.css" rel="stylesheet" type="text/css" />
@Renien
Renien / mapper.py
Last active March 6, 2016 12:05
POS Tags visualization MapReduce job
__author__ = 'renienj'
#!/usr/bin/env python
import sys
def read_input(file):
for line in file:
# split the line into words
yield line.split()