I used these scripts with wordlist-dedup to deduplicate a whole collection of wordlists safely.
I used a tempdir
variable in the dedup
script to temporary dir located on another harddrive to speed up the sorting process. I changed it to /tmp in the file below. You can choose something else, if you like.
dedup
does the whole thing. Let's assume, you have a file:filename.ext
. The script first sorts the output in a file:filename_sorted.ext
and then deduplicates the file to a third one:filename_sorted_dedup.ext
.- The script
to_txt
converts non txt files to txt files. - The
cleanup
script deletes non deduped files. Move the three files in the folder you want to dedup filewise.