Skip to content

Instantly share code, notes, and snippets.

@infomaven
Last active January 25, 2021 05:51
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save infomaven/642174713009f60d65bb67caa0480212 to your computer and use it in GitHub Desktop.
Save infomaven/642174713009f60d65bb67caa0480212 to your computer and use it in GitHub Desktop.
Find difference between 2 CSV files & identify shared items. Diff report is printed to HTML.

CSV FILE DIFF SCRIPT

  • Uses standard Python3 modules
  • Finds diff between 2 CSV files & prints results to HTML
  • Finds and prints list of items found in both files
  • Does NOT find duplicates in same file

USAGE:

  1. Download script file and sample CSVs to a directory on your computer
  2. Run script with this command>> python3 compare_csv_files.py
  • Script will generate html report in same directory
victory solution court tin nearest couple
plural comfortable grandfather came easily written
tight column begun softly go plate
sent too discover generally handsome habit
actually car wore create soil pick
independent warm party society chamber sweet
tent flame liquid faster cowboy circle
tent flame liquid faster cowboy circle
tent flame liquid faster cowboy circle
salt desert build win idea room
major scared young though contain beside
camera steel opportunity farm nodded right
baseball clock asleep grandmother charge fish
color frozen activity break stems sun
sent likely held visit warn fresh
since us stomach slide create opportunity
body additional jar hang tone football
still gentle brought atomic son silver
jack birthday cast canal gravity with
current strong rays drew beyond share
actually car wore create soil pick
independent warm party society chamber sweet
tent flame liquid faster cowboy circle
import csv
import sys, os, difflib, argparse
from datetime import datetime, timezone
def get_duplicates(list_a, list_b):
dups = [item for item in list_a if item in list_b]
return dups
def generate_duplicates_report(first_file, second_file):
data1= []
data2 = []
with open(first_file, newline='') as csvfile:
csv_reader1 = csv.reader(csvfile, delimiter=',')
for row in csv_reader1:
data1.append(tuple(row))
with open(second_file, newline='') as csvfile:
csv_reader2 = csv.reader(csvfile, delimiter=',')
for row in csv_reader2:
data2.append(tuple(row))
shared = get_duplicates(data1, data2)
print("Found in both files: ", shared)
def generate_diff_report(first_file, second_file):
with open(first_file, newline='') as csvfile:
rubric = csvfile.readlines()
with open(second_file, newline='') as csvfile:
comparison = csvfile.readlines()
reportname = f"DIFF-{first_file}_{second_file}.html"
diff = difflib.HtmlDiff().make_file(rubric,comparison,first_file,second_file)
Html_file = open(reportname, "w")
Html_file.writelines(diff)
Html_file.close()
first_file = input("Enter first file name: ")
second_file = input("Enter first file name: ")
generate_duplicates_report(first_file,second_file )
generate_diff_report(first_file,second_file )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment