In moving my email from gmail to outlook, it would seem that I’ve ended up with multiple copies of emails. How many such emails there are and how to remove them I am still struggling to figure out. The problem seems to be that the same message ID but have different X-TUID
.
$ mu find msgid:CY1PR15MB0155CB9AAD45DA010FFFC2FCF15E0@CY1PR15MB0155.namprd15.prod.outlook.com -f 'l' /home/tbutt/.mail/outlook/Sent/cur/1522002651.16857_33.knuckles,U=36:2,S /home/tbutt/.mail/outlook/Archive/cur/1522002326.15658_3659.knuckles,U=236990:2,S /home/tbutt/.mail/outlook/Archive/cur/1521978114.18576_998.knuckles,U=2593:2,S /home/tbutt/.mail/gc/Archive/cur/1522211460.25821_33.knuckles,U=34:2,S /home/tbutt/.mail/outlook/Archive/cur/1521979684.21093_9183.knuckles,U=20859:2,S $ diff /home/tbutt/.mail/outlook/Archive/cur/1522002326.15658_3659.knuckles,U=236990:2,S /home/tbutt/.mail/outlook/Archive/cur/1521978114.18576_998.knuckles,U=2593:2,S < X-TUID: AR/2lqM1OiYM 24a24 > X-TUID: Brt0VWOJb91f $ md5sum /home/tbutt/.mail/outlook/Archive/cur/1522002326.15658_3659.knuckles,U=236990:2,S /home/tbutt/.mail/outlook/Archive/cur/1521978114.18576_998.knuckles,U=2593:2,S ceafb53ef363ecde8fe77d270c7bef13 /home/tbutt/.mail/outlook/Archive/cur/1522002326.15658_3659.knuckles,U=236990:2,S 7caa6b1f87405f945966b1916b5eaf07 /home/tbutt/.mail/outlook/Archive/cur/1521978114.18576_998.knuckles,U=2593:2,S
Doing a search for X-TUID
brings up a thread on mu-discuss on this specific issue. The solution is to wrap md5sum
that find-dups.scm
calls to replace this header. Running the modified script, I find there are 16040 messages with duplicate md5sums in 16236 files (meaning a few have more than one duplicate).
$ mu index -m ~/.mail/ indexing messages under /home/tbutt/.mail [/home/tbutt/.mu/xapian] - processing mail; processed: 237225; updated/new: 0, cleaned-up: 0 cleaning up messages [/home/tbutt/.mu/xapian] - processing mail; processed: 245400; updated/new: 0, cleaned-up: 15617