Skip to content

Instantly share code, notes, and snippets.

@almostSouji
Created April 3, 2024 10:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save almostSouji/6f4d7c4e18b441b87f235e4e5aef01ef to your computer and use it in GitHub Desktop.
Save almostSouji/6f4d7c4e18b441b87f235e4e5aef01ef to your computer and use it in GitHub Desktop.
Simple py script to find the longest and highest entropy lines in a file
#!/usr/bin/env python3
from collections import Counter
from math import log
def entropy(text):
cc = Counter(text)
ps = [float(cc[c]/len(text)) for c in cc]
return -sum([p * log(p)/log(2.0) for p in ps])
m = {}
o = {}
for index, line in enumerate(open(0)):
m[index] = len(line)
o[index] = entropy(line)
d = sorted(m, key=m.get, reverse=True)
os = sorted(o, key=o.get, reverse=True)
print("Longest:")
for k in d[:10]:
print(f"l.{k+1}:\t{m[k]}")
print()
print("Most Shannon's entropy:")
for k in os[:10]:
print(f"l.{k+1}:\t{o[k]}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment