Skip to content

Instantly share code, notes, and snippets.

Eric Lease Morgan ericleasemorgan

Block or report user

Report or block ericleasemorgan

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@ericleasemorgan
ericleasemorgan / natural language processing with shell
Last active Mar 15, 2018
some one-liners to extract urls, email address, and a dictionary from a text file
View natural language processing with shell
# extract all urls from a text file
cat file.txt | egrep -o 'https?://[^ ]+' | sed -e 's/https/http/g' | sed -e 's/\W+$//g' | sort | uniq -c | sort -bnr
# extraxt domains from URL's found in text files
cat file.txt | egrep -o 'https?://[^ ]+' | sed -e 's/https/http/g' | sed -e 's/\W+$//g' | sed -e 's/http:\/\///g' | sed -e 's/\/.*$//g' | sort | uniq -c | sort -bnr
# extract email addresses
cat file.txt | grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' | sort | uniq -c | sort -bnr
# list all words in a text file
@ericleasemorgan
ericleasemorgan / tika2text.sh
Last active Mar 27, 2017
(brain-dead) shell script using TIKA in server mode to convert a batch of files to plain text
View tika2text.sh
#!/bin/bash
# tika2text.sh - given a directory, recursively extract text frome files
# Eric Lease Morgan <emorgan@nd.edu>
# (c) University of Notre Dame, distributed under a GNU Public License
# March 27, 2017 - a second cut; works with a directory
@ericleasemorgan
ericleasemorgan / gist:8984187
Created Feb 13, 2014
given a (CrossRef) DOI, parse link header of HTTP request to get fulltext URLs
View gist:8984187
sub extracter {
# given a (CrossRef) DOI, parse link header of HTTP request to get fulltext URLs
# see also: https://prospect.crossref.org/splash/
# Eric Lease Morgan <emorgan@nd.edu>
# February 12, 2014 - first cut
# require
use HTTP::Request;
@ericleasemorgan
ericleasemorgan / gist:8438082
Created Jan 15, 2014
Perl subroutine to slurp up the contents of a text file
View gist:8438082
sub slurp {
my $f = shift;
open ( F, $f ) or die "Can't open $f: $!\n";
my $r = do { local $/; <F> };
close F;
return $r;
}
You can’t perform that action at this time.