Skip to content

Instantly share code, notes, and snippets.

View zawy12's full-sized avatar

zawy12

View GitHub Profile
04f6c07788a3af5e39669f0631e2b56be7596a2c766955163aafd46e60b8b3faabcc7124fb424d26dacaaf6a6e3142079ceb89c5a43aae774259850ca7f9d42273
@zawy12
zawy12 / author_compare.pl
Last active February 20, 2022 12:31
Find most similar author compared to a baseline author using 'self information' entropy difference between txt files.
#!/usr/bin/perl
BEGIN { use CGI::Carp qw(carpout); open(LOG, ">>_error2.txt") or die("Unable to open mycgi-log: $!\n"); carpout(LOG); }
BEGIN { $SIG{"__DIE__"} = $SIG{"__WARN__"} = sub { my $error = shift; chomp $error; $error =~ s/[<&>]/"&#".ord($&).";"/ge; print "Content-type: text/html\n\n$error\n"; exit 0; } }
$|=1;
$baselinefile='author_baseline.txt'; # unknown author. Stays in directory with this program
$baselinesize=-s $baselinefile; # get size of file in bytes
$buffer=1.2; # helps assure enough words are pulled in from known files
$dir='authors'; # all files > 30% bigger than baseline file to make sure enough words are retireved.
$firstrun=1;
$lastrun=17;
@zawy12
zawy12 / compare_authors.cgi
Last active May 20, 2016 16:12
Author Identification comparison de-anonymizing stylometry
#!usr/bin/perl
BEGIN { use CGI::Carp qw(carpout); open(LOG, ">>_error2.txt") or die("Unable to open mycgi-log: $!\n"); carpout(LOG); }
BEGIN { $SIG{"__DIE__"} = $SIG{"__WARN__"} = sub { my $error = shift; chomp $error; $error =~ s/[<&>]/"&#".ord($&).";"/ge; print "Content-type: text/html\n\n$error\n"; exit 0; } }
$|=1;
$baselinefile='author_baseline.txt'; # unknown author. Stays in directory with this program
$baselinesize=-s $baselinefile; # get size of file in bytes
$buffer=1.2; # helps assure enough words are pulled in from known files
$dir='authors'; # all files > 30% bigger than baseline file to make sure enough words are retireved.
print "== Output and instructions are printed to author_compare_out.txt ==";
######## PRINT HTML HEADER #######
#!usr/bin/perl
BEGIN { use CGI::Carp qw(carpout); open(LOG, ">>_error2.txt") or die("Unable to open mycgi-log: $!\n"); carpout(LOG); }
BEGIN { $SIG{"__DIE__"} = $SIG{"__WARN__"} = sub { my $error = shift; chomp $error; $error =~ s/[<&>]/"&#".ord($&).";"/ge; print "Content-type: text/html\n\n$error\n"; exit 0; } }
# This program ranks how similar a text basefile author is to target file authors. Meant for English words.
# Can work on 7KB files (1,000 words) if target files are 50KB. Both at 50KB is darn good.
# Accuracy, approx: 50% in 1st place given 20 authors in same genre with 50k files.
# Use SVMLink open software ranking capability for professional work.
# Author: Scott Roberts, 2016.