Skip to content

Instantly share code, notes, and snippets.

View pmichel31415's full-sized avatar
🔍
Doing research

Paul Michel pmichel31415

🔍
Doing research
View GitHub Profile
@pmichel31415
pmichel31415 / acl2018stats.sh
Created April 25, 2018 16:16
Stats on ACL 2018 accepted papers
#!/bin/bash
# Get the data
wget -nv http://acl2018.org/conference/accepted-papers/index.html
# Keep the list of papers only
sed -i '/paper-title/!d' index.html
# Extract author lists to csv
sed 's:.*<span class="paper-authors">\([^<]*\)</span>.*:\1:;s:(, | and ):,:g' index.html | tr '[:upper:]' '[:lower:]' > authors.txt
# Author frequencies
sed 's/,/\n/g' authors.txt | sort | uniq -c | sort -n | cut -d" " -f7 | uniq -c