Skip to content

Instantly share code, notes, and snippets.

View skwolvie's full-sized avatar
🎯
Focusing

Sachin Kumar S skwolvie

🎯
Focusing
  • Indian School of Business
  • Sunnyvale, CA
  • 19:22 (UTC -12:00)
View GitHub Profile
@skwolvie
skwolvie / workex_noise.csv
Created April 19, 2022 17:57
Noisy work experiences when google forms is not set to a strict integer/ range
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
yoe
15
10
20
12
8
16
14
5
11
@skwolvie
skwolvie / ti_sample.csv
Created November 1, 2021 08:42
sample 2019 Traffic index
Date weekend_or_holiday Mumbai New Delhi DELHI MUMBAI
3/7/2019 0 11.05 6.88
3/8/2019 0 53.76 34.12
3/9/2019 1 23.82 38.49
3/10/2019 1 11.56 27.41
3/11/2019 0 34.58 36.21
3/12/2019 0 46.38 42.8
3/13/2019 0 45.34 41.87
3/14/2019 0 36.9 45.45
3/15/2019 0 43.16 45.56
@skwolvie
skwolvie / .py
Created July 19, 2021 00:50
yake keyword extractor
import yake
import swifter
from nltk.tokenize import RegexpTokenizer
def keywords_yake(text):
# take keywords for each post & turn them into a text string "sentence"
simple_kwextractor = yake.KeywordExtractor(lan='ja',
n=1,
dedupLim=.99,
dedupFunc='seqm',
windowsSize=100,
@skwolvie
skwolvie / .py
Created July 18, 2021 20:58
spacy noun chunks japanese
import spacy
from spacy import displacy
from IPython.display import HTML, display
# change the model to en_core_web_sm/lg for english
nlp = spacy.load('ja_core_news_lg')
text= "アプリインフラWebサイト等幅広く手掛けます RPAAIシステム開発にも携わって頂きます HTML Javascript PHP Python JAVAの知識が身につきます "
def extract_noun_phrase_experience(doc):
a=[]
@skwolvie
skwolvie / .py
Created July 18, 2021 20:45
Japanese Jobs- text cleaning
import re
import string
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
st = set(stopwords.words('japanese'))
punc= string.punctuation
SYMBOLS = r"[★・︾◯●:()【】、⇒_。①②./」「\-◆?~※!:><<>→〇×△→―]"