Skip to content

Instantly share code, notes, and snippets.

@jenjenjiang
jenjenjiang / logistic_regression.ipynb
Created August 20, 2019 09:26
logistic regression library in python
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jenjenjiang
jenjenjiang / csv_linear_regression.ipynb
Last active August 20, 2019 09:16
Read a CSV file and build a regression model. Make prediction as an example.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jenjenjiang
jenjenjiang / google_map.ipynb
Last active August 20, 2019 09:15
Use Google Place API to request data and store it in python format
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jenjenjiang
jenjenjiang / dictionary.ipynb
Last active August 20, 2019 09:17
Panda examples
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from os import path
import jieba
import jieba.analyse as ja
pip install jieba
from gensim.test.utils import common_texts, get_tmpfile
with open('../JJ/lyric.txt', 'r') as handle:
print(handle)
for line in handle:
tags = ja.extract_tags(line, topK=10, withWeight=True)
@jenjenjiang
jenjenjiang / DecisionTreeClassifier predict
Created March 6, 2019 21:30
Lyon Olympics churn prediction
# DecisionTreeClassifier predict
from sklearn.metrics import accuracy_score
x_test = dev_test.iloc[:,0:46]
x_test_1 = x_test.drop(['Client_MYOL_Statut'], axis=1, inplace=False)
x_t1_m = x_test_1.as_matrix()
test_pred = clf.predict(x_t1_m)
y_test = dev_test.Client_Abo_1819.as_matrix()
@jenjenjiang
jenjenjiang / Silhouette Analysis for the value of k
Last active March 6, 2019 21:21
use k-means to cluster customers_BMW sample data
#use Silhouette Analysis
from sklearn.metrics import silhouette_samples, silhouette_score
range_n_clusters = [2, 3, 4, 5, 6,7,8,9]
for n_clusters in range_n_clusters:
kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(data_std)
labels = kmeans.labels_
silhouette_avg = silhouette_score(data_std, labels)
print("For n_clusters =", n_clusters,
"The average silhouette_score is :", silhouette_avg)
@jenjenjiang
jenjenjiang / data clean.txt
Last active March 3, 2019 20:39
Data clean and build models to do classifier
import pandas as pd
import numpy as np
import sys
def get_size(total_size, percentage, mean):
size_train = int(percentage * total_size / 100)
size_client_abo = [int(size_train * (1 - mean)), int(size_train * mean)]
return size_client_abo
def populate(data, indexs, sizes):
@jenjenjiang
jenjenjiang / Retrain the linear and polynomial models
Created February 15, 2019 21:35
Generate a data set and Training on whole data, visualization, and stat
#1 Retrain the linear and polynomial models
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.3, random_state=0)
@jenjenjiang
jenjenjiang / plot.ipynb
Last active August 20, 2019 09:23
Plot the average distances and minimum distances with respect to dimensions
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.