Skip to content

Instantly share code, notes, and snippets.

@nh2
Created October 27, 2013 18:14
Show Gist options
  • Save nh2/7185938 to your computer and use it in GitHub Desktop.
Save nh2/7185938 to your computer and use it in GitHub Desktop.
Counts total number of duplicate lines from a CPD output
# Counts total number of duplicate lines from a CPD output.
# Example:
# pmd-bin-5.0.5/bin/run.sh cpd --minimum-tokens 100 --files ../mysource --format text --language java | python cpd-sum.py
import fileinput
import re
nlines = 0
places = 0
total_dups = 0
for line in fileinput.input():
m = re.match(r'Found a (\d+) line', line)
if m:
dups = nlines * (places - 1)
total_dups += dups
nlines = int(m.group(1))
places = 0
m = re.match(r'Starting at line', line)
if m:
places += 1
print total_dups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment