Skip to content

Instantly share code, notes, and snippets.

@pr0way
Last active December 27, 2019 17:17
Show Gist options
  • Save pr0way/5b2b95f3644a22710e5a87be9a347d67 to your computer and use it in GitHub Desktop.
Save pr0way/5b2b95f3644a22710e5a87be9a347d67 to your computer and use it in GitHub Desktop.
Simple removeDuplicate script

How to use?

Simple copy/paste code or download/extract and:

  1. Permit to execute chmod +x removeDumplicate.py

  2. Use: ./removeDuplicate.py <source-file> <target-file>

Parameters Required Description
<source-file> Yes Your file which you want to modify
-o <target-file> No Destination of file with results of work this script (default: results)

Example:

You have list like:

apple
banana
apple
strawberry
pineapple
banana

After run:

strawberry
apple
banana
pineapple

Pay attention! Script doesn't care about the order of list!

#!/usr/bin/python3
import sys
from argparse import ArgumentParser
parser = ArgumentParser(description='Remove duplicates data. Script go line by line and check if somewhere is exist duplicate (of previous) row and remove it.')
parser.add_argument('<source-file>', help="Source file with some sort of data")
parser.add_argument('-o', dest='<target-file>', default='result', help="File name or path to destination where we save output (default: result)")
arguments = vars(parser.parse_args())
if len(sys.argv) > 0:
data = set()
# Source file read
with open(arguments['<source-file>']) as source_file:
for line in source_file:
data.add(line.strip())
# Remove empty strings (strip doesn't do that)
data = list(filter(None, data))
# Target file write
with open(arguments['<target-file>'], 'w') as target_file:
for num, value in enumerate(data, start=1):
# End of the file? Write without new line
if num == len(data):
target_file.write(value)
else:
target_file.write(value + "\n")
print("Finally, your file: " + source_file.name + " is without duplicates!")
@mpds
Copy link

mpds commented Dec 27, 2019

nice! maybe you could write a comment about command line options (although it’s kind of obvious)

@pr0way
Copy link
Author

pr0way commented Dec 27, 2019

You're right, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment