Skip to content

Instantly share code, notes, and snippets.

@atomotic

atomotic/Readme.md

Last active Jul 30, 2020
Embed
What would you like to do?
Internet Archive Save Page Now

save a page to internetarchive wayback from shell

put the function in your .zshrc or .bashrc and then

~  ia-save http://twitter.com/atomotic
https://web.archive.org/web/20140702123925/http://twitter.com/atomotic
function ia-save() { curl -s -I https://web.archive.org/save/$* | grep Content-Location | awk '{print "https://web.archive.org"$2}' }
@hugovk

This comment has been minimized.

Copy link

@hugovk hugovk commented Apr 20, 2015

I added the function:

function ia-save() { curl -s -I https://web.archive.org/save/$* | grep Content-Location | awk '{print "https://web.archive.org"$2}' }

to the end of my OS X .bashrc and called source ~/.bashrc but got:

-bash: .bashrc: line 12: syntax error: unexpected end of file

It needs a semicolon:

function ia-save() { curl -s -I https://web.archive.org/save/$* | grep Content-Location | awk '{print "https://web.archive.org"$2}'; }

But it doesn't work. Just the curl:

HTTP/1.1 403 Forbidden
Server: Tengine/2.0.3
Date: Mon, 20 Apr 2015 18:10:34 GMT
Content-Type: text/html;charset=utf-8
Connection: keep-alive
set-cookie: wayback_server=46; Domain=archive.org; Path=/; Expires=Wed, 20-May-15 18:10:33 GMT;
X-Archive-Wayback-Liveweb-Error: RobotAccessControlException: Blocked By Robots
X-Archive-Playback: 0
@edsu

This comment has been minimized.

Copy link

@edsu edsu commented Apr 20, 2015

What were you trying to curl?

@paulkaefer

This comment has been minimized.

Copy link

@paulkaefer paulkaefer commented Oct 26, 2015

I'm trying to make this a bash alias. If I add it as-is, it says "unexpected end of file" when I reload my .bashrc.

If I add a semicolon, as hugovk suggests, the .bashrc file works, but when I go to archive a page, I get the following:

awk: cmd. line:1: {print
awk: cmd. line:1:       ^ unexpected newline or end of string

Any ideas? I've been playing around with the quotes (switching between " and '), but with no success.

@paulkaefer

This comment has been minimized.

Copy link

@paulkaefer paulkaefer commented Oct 28, 2015

I asked on StackOverflow and the following works for me:

function ia-save() {
    curl -s -I "https://web.archive.org/save/$1" |
    grep Content-Location |
    awk '{printf( "https://web.archive.org/%s\n",$2)}';
}
@lyda

This comment has been minimized.

Copy link

@lyda lyda commented Jan 13, 2019

You don't need grep.

function ia-save() {
    curl -s -I "https://web.archive.org/save/$1" |
    awk '/^Content-Location/ {print "https://web.archive.org/" $2}';
}
@jerclarke

This comment has been minimized.

Copy link

@jerclarke jerclarke commented Jul 28, 2020

Hey! You all are doing something that seems to have broken for me. The web.archive.org/save/ is no longer returning a Content-Location for me in an application where it used to work.

Anyone else having this issue since July 10?

Here's a link to a related ticket: berkmancenter/amber_wordpress#59

@atomotic

This comment has been minimized.

Copy link
Owner Author

@atomotic atomotic commented Jul 28, 2020

seems that GET https://web.archive.org/save/___ is not working anymore, there is a POST now. i will look later

@jerclarke

This comment has been minimized.

Copy link

@jerclarke jerclarke commented Jul 30, 2020

Thanks!

I tried just looking at dev tools when using the website version of /save/ and it seems like the POST request is super simple, just url=$url.

When I run that request through PHP (WordPress HTTP API) it seems to work based on the content that's returned, but there's still no Content-Location header.

Let me know if you find something different 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.