Skip to content

Instantly share code, notes, and snippets.

View optiflow's full-sized avatar
🏠
Working from home

Lim Tern Poh optiflow

🏠
Working from home
View GitHub Profile
@optiflow
optiflow / rfm_valyes.py
Created January 5, 2019 04:42
calculate the rfm values of each customer
# Calculate 1-year date range from latest data
end_date = df['Date'].max()
# Filter 1-year data range from original df
start_date = end_date - pd.to_timedelta(364, unit='d')
df_rfm = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
# Create hypothetical snapshot date
snapshot_date = end_date + dt.timedelta(days=1)
@optiflow
optiflow / kmeans_implementation.py
Last active August 8, 2019 02:12
K-Mean implementation
def kmeans(df, clusters_number):
'''
Implement k-means clustering on dataset
INPUT:
dataset : dataframe. Dataset for k-means to fit.
clusters_number : int. Number of clusters to form.
end : int. Ending range of kmeans to test.
OUTPUT:
Cluster results and t-SNE visualisation of clusters.
@optiflow
optiflow / optimal_kmeans_ss.py
Last active August 8, 2019 02:12
Calculate optimal of k-means clusters with Silhouette Score
def optimal_kmeans(dataset, start=2, end=11):
'''
Calculate the optimal number of kmeans
INPUT:
dataset : dataframe. Dataset for k-means to fit
start : int. Starting range of kmeans to test
end : int. Ending range of kmeans to test
OUTPUT:
Values and line plot of Silhouette Score.