This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Calculate 1-year date range from latest data | |
end_date = df['Date'].max() | |
# Filter 1-year data range from original df | |
start_date = end_date - pd.to_timedelta(364, unit='d') | |
df_rfm = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)] | |
# Create hypothetical snapshot date | |
snapshot_date = end_date + dt.timedelta(days=1) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def kmeans(df, clusters_number): | |
''' | |
Implement k-means clustering on dataset | |
INPUT: | |
dataset : dataframe. Dataset for k-means to fit. | |
clusters_number : int. Number of clusters to form. | |
end : int. Ending range of kmeans to test. | |
OUTPUT: | |
Cluster results and t-SNE visualisation of clusters. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def optimal_kmeans(dataset, start=2, end=11): | |
''' | |
Calculate the optimal number of kmeans | |
INPUT: | |
dataset : dataframe. Dataset for k-means to fit | |
start : int. Starting range of kmeans to test | |
end : int. Ending range of kmeans to test | |
OUTPUT: | |
Values and line plot of Silhouette Score. |