Skip to content

Instantly share code, notes, and snippets.

View ericleasemorgan's full-sized avatar

Eric Lease Morgan ericleasemorgan

View GitHub Profile
@ericleasemorgan
ericleasemorgan / gist:8984187
Created February 13, 2014 21:27
given a (CrossRef) DOI, parse link header of HTTP request to get fulltext URLs
sub extracter {
# given a (CrossRef) DOI, parse link header of HTTP request to get fulltext URLs
# see also: https://prospect.crossref.org/splash/
# Eric Lease Morgan <emorgan@nd.edu>
# February 12, 2014 - first cut
# require
use HTTP::Request;
@ericleasemorgan
ericleasemorgan / gist:8438082
Created January 15, 2014 15:17
Perl subroutine to slurp up the contents of a text file
sub slurp {
my $f = shift;
open ( F, $f ) or die "Can't open $f: $!\n";
my $r = do { local $/; <F> };
close F;
return $r;
}
@ericleasemorgan
ericleasemorgan / tika2text.sh
Last active March 27, 2017 20:47
(brain-dead) shell script using TIKA in server mode to convert a batch of files to plain text
#!/bin/bash
# tika2text.sh - given a directory, recursively extract text frome files
# Eric Lease Morgan <emorgan@nd.edu>
# (c) University of Notre Dame, distributed under a GNU Public License
# March 27, 2017 - a second cut; works with a directory
@ericleasemorgan
ericleasemorgan / natural language processing with shell
Last active January 27, 2023 01:09
some one-liners to extract urls, email address, and a dictionary from a text file
# extract all urls from a text file
cat file.txt | egrep -o 'https?://[^ ]+' | sed -e 's/https/http/g' | sed -e 's/\W+$//g' | sort | uniq -c | sort -bnr
# extraxt domains from URL's found in text files
cat file.txt | egrep -o 'https?://[^ ]+' | sed -e 's/https/http/g' | sed -e 's/\W+$//g' | sed -e 's/http:\/\///g' | sed -e 's/\/.*$//g' | sort | uniq -c | sort -bnr
# extract email addresses
cat file.txt | grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' | sort | uniq -c | sort -bnr
# list all words in a text file
@ericleasemorgan
ericleasemorgan / network-5c604132-375.gexf
Created November 15, 2023 12:57
File sent from Gephi
<?xml version='1.0' encoding='UTF-8'?>
<gexf xmlns="http://gexf.net/1.3" version="1.3" xmlns:viz="http://gexf.net/1.3/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://gexf.net/1.3 http://gexf.net/1.3/gexf.xsd">
<meta lastmodifieddate="2023-11-15">
<creator>Gephi 0.10.1</creator>
<title></title>
<description></description>
</meta>
<graph defaultedgetype="directed" mode="static">
<attributes class="node" mode="static">
<attribute id="types" title="types" type="string"/>