/How to identify recurring patterns in this set of transactional data
Created Mar 19, 2020
How to identify recurring patterns in this set of transactional data
I'm working on a dataset of banking transactions and would like to find recurrent transactions. | |
I've been mapping transactions per merchant in timeseries, and tried to use acf from statsmodels.tsa.stattools to calculate the autocorrelation function but i'm not getting the expected results: | |
`r = acf(ts, fft=False)` | |
For example this set of transaction (ASSURANCE DESJ) is getting an acf score of 0.3159 when it's obviously a recurring transaction (same amount, same frequency). | |
[![enter image description here][1]][1] | |
Another example of recurring transactions with acf=0.22775: | |
[![enter image description here][2]][2] | |
But this one should not be found as a recurring transactions, and get a score not too far from the previous set (0.26919): | |
[![enter image description here][3]][3] | |
I've been checking a lot of different methods, I acutally came up with a combination of auto-correlation on the regular timeserie, auto-correlation on a timeserie with amount=1, with stationnary checking and other rules to have a not so perfect results. I've also checked at ARIMA and other methodology without luck. | |
**Would you have a better way to detect recurring transactions from timeseries ?** | |
Dataset A ('ASSURANCE DESJ. ASS. GEN.') : | |
``` | |
{ | |
'2019-07-15': 9831.0, | |
'2019-08-15': 9818.0, | |
'2019-09-16': 9818.0, | |
'2019-10-15': 9818.0, | |
'2019-11-15': 9818.0, | |
'2019-12-16': 9818.0, | |
'2020-01-15': 9818.0, | |
'2020-02-17': 9818.0, | |
'2020-03-16': 9818.0 | |
} | |
``` | |
Dataset B ('STATI NEMENT VILLE MTL') : | |
``` | |
{ | |
'2018-12-10': 447.0, | |
'2019-02-11': 107.0, | |
'2019-02-25': 82.0, | |
'2019-03-12': 418.0, | |
'2019-03-28': 142.0, | |
'2019-04-01': 167.0, | |
'2019-04-04': 261.0, | |
'2019-04-17': 127.0, | |
'2019-04-22': 223.0, | |
'2019-04-29': 326.0, | |
'2019-05-14': 657.0, | |
'2019-06-20': 332.0, | |
'2019-07-02': 332.0, | |
'2019-07-17': 332.0, | |
'2019-07-29': 69.0, | |
'2019-09-09': 277.0, | |
'2019-12-12': 332.0, | |
'2019-12-31': 169.0, | |
'2020-01-19': 169.0, | |
'2020-02-21': 657.0, | |
'2020-02-28': 657.0, | |
'2020-02-29': 537.0, | |
'2020-03-06': 575.0 | |
} | |
``` | |
Dataset C ('STATI NEMENT VILLE MTL') : | |
``` | |
{ | |
'2018-12-10': 447.0, | |
'2019-02-11': 107.0, | |
'2019-02-25': 82.0, | |
'2019-03-12': 418.0, | |
'2019-03-28': 142.0, | |
'2019-04-01': 167.0, | |
'2019-04-04': 261.0, | |
'2019-04-17': 127.0, | |
'2019-04-22': 223.0, | |
'2019-04-29': 326.0, | |
'2019-05-14': 657.0, | |
'2019-06-20': 332.0, | |
'2019-07-02': 332.0, | |
'2019-07-17': 332.0, | |
'2019-07-29': 69.0, | |
'2019-09-09': 277.0, | |
'2019-12-12': 332.0, | |
'2019-12-31': 169.0, | |
'2020-01-19': 169.0, | |
'2020-02-21': 657.0, | |
'2020-02-28': 657.0, | |
'2020-02-29': 537.0, | |
'2020-03-06': 575.0 | |
} | |
``` | |
[1]: https://i.stack.imgur.com/hw8n1.png | |
[2]: https://i.stack.imgur.com/1XLNn.png | |
[3]: https://i.stack.imgur.com/tmw6J.png |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment