Skip to content

Instantly share code, notes, and snippets.

@pontikos
Last active May 30, 2021 16:49
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pontikos/202c18b7f6810496bc0f to your computer and use it in GitHub Desktop.
Save pontikos/202c18b7f6810496bc0f to your computer and use it in GitHub Desktop.
Retrieve download URLs from Michigan impute server

On the results page of the imputation, in Chrome, open you javascript console and run this:

copy(document.body.innerHTML);

This will copy the javacript rendered page to you clipboard. Now paste it in a document say download_page.html.

Then run this python script to extract the urls:

from __future__ import print_function
import sys
import re

#file1=sys.argv[1]
file1="download_page.html"

f=file(file1,'r')
s=f.read()
s=str(s)
s=s.replace('\n','')

res=re.findall('https.*?log',s)
logs=list(set(res[1:len(res)]))
for l in logs:
    print(l)

res=re.findall('https.*?zip',s)
zips=list(set(res[1:len(res)]))
for z in zips:
    print(z)


res=re.findall('https.*?html',s)
html=list(set(res[1:len(res)]))
for h in html:
    print(h)


res=re.findall('https.*?txt',s)
stats=list(set(res[1:len(res)]))
for s in stats:
    print(s)
    

Now you can use wget to retrieve the files.

for x in `cat urls.txt` ; do wget $x ; done

Next save your password in password.txt and you can then unzip all the files without prompting:

password=`cat password.txt`
for x in *.zip
do
    echo unzip -P $password -o $x
    unzip -P $password -o $x
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment