Skip to content

Instantly share code, notes, and snippets.

@avangardistic
Created August 6, 2020 12:37
Show Gist options
  • Save avangardistic/dc121d4d57fd1344cb9ca028656f21ab to your computer and use it in GitHub Desktop.
Save avangardistic/dc121d4d57fd1344cb9ca028656f21ab to your computer and use it in GitHub Desktop.
Remove duplicate urls of a text file
# Require python 3.8
import sys
import pandas as pd
def remove_dup(path):
f = open(path, "r")
arr = pd.Series([line for line in f])
f.close()
return arr.drop_duplicates()
def update_file(path, urls):
f = open(path, "w+")
for url in urls:
f.write(url)
f.close()
print("file updated...")
if __name__ == '__main__':
if path:= sys.argv[1]: # if you don't have python 3.8 remove 'path:=' from this line and put the path var inside if statement
update_file(path=path, urls=remove_dup(path))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment