Skip to content

Instantly share code, notes, and snippets.

View MSWon's full-sized avatar

Minsub Won MSWon

View GitHub Profile
@MSWon
MSWon / KorQuAD_to_csv.py
Created January 2, 2019 01:40
KorQuAD to csv python
import numpy as np
import pandas as pd
train = pd.read_json('./KorQuAD_v1.0_train.json')
valid = pd.read_json('./KorQuAD_v1.0_dev.json')
valid.head(5)
valid['data'][0]['paragraphs'][0]
@MSWon
MSWon / yelp_json2csv.py
Created January 1, 2019 02:08
converting yelp json to csv file
import json
import numpy as np
import pandas as pd
from sklearn.utils import shuffle
filename = './yelp_academic_dataset_review.json'
def make_dataset(filename):
data =[]
@MSWon
MSWon / Hangul_decompose.py
Created December 30, 2018 10:12
decompose Hangul
from hgtk.text import decompose as decom
a = decom("감스트")
b = a.split("ᴥ")
del(b[-1])
@MSWon
MSWon / Dataframe_from_list.py
Created December 11, 2018 03:32
Create pd.Dataframe from list using dict
import pandas as pd
months = ['Jan','Apr','Mar','June']
days = [31,30,31,30]
d = {'Month':months,'Day':days}
df = pd.DataFrame(d)
'''
Day Month
@MSWon
MSWon / glob.py
Created October 15, 2018 13:06
Get list of file names in current dir
from glob import glob
glob("./*") ## Get list of file names in current dir
@MSWon
MSWon / plot_loss_graph.py
Created October 5, 2018 07:56
plotting loss graph using pyplot
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
data = pd.read_csv("./loss_accuracy.csv")
plt.figure(figsize = (10,7))
plt.plot(range(1,11),data['train_loss'], label = 'train', marker = "D",linewidth = 2.5, markersize=8,
color = "C1")
plt.plot(range(1,11),data['test_loss'], label = 'test', marker = "D",linewidth = 2.5, markersize=8,
@MSWon
MSWon / split_dataframe.py
Created September 27, 2018 04:20
splitting dataframe in python by using groupby function
import pandas as pd
## df is pd.DataFrame
h = [g for _, g in df.groupby('v1')]
import re
test = "What do you need?"
re.sub("do", "", test)
## "What you need?" ##
@MSWon
MSWon / pad_seq.py
Created July 10, 2018 07:07
zero padding sequences
from keras.preprocessing.sequence import pad_sequences
train_seq = [[1,2,3],[4,7,9,1]]
pad_sequences(train_seq, maxlen = 5, padding = "post")