Skip to content

Instantly share code, notes, and snippets.

@malev
Last active April 21, 2017 21:39
Show Gist options
  • Save malev/2b1bf5a42ad520e7be918f6bee486e6b to your computer and use it in GitHub Desktop.
Save malev/2b1bf5a42ad520e7be918f6bee486e6b to your computer and use it in GitHub Desktop.
Test amp errors
  • Can you verify 5555 amp links?
  • sure I can!

What do we need?

What do we have?

The links are in a CSV files with this format:

ID,BRAND,AMP Error Type,AMP URL,Last detected

Research

I need to send one by one through amphtml-validator:

$ node_modules/.bin/amphtml-validator http://www.bonappetit.com/uncategorized/article/the-linkery-03-24-10/amp --format json
{"http://www.bonappetit.com/uncategorized/article/the-linkery-03-24-10/amp":{"status":"FAIL","errors":[{"severity":"ERROR","line":33,"col":3,"message":"Invalid URL protocol 'foodhttp:' for attribute 'href' in tag 'a'.","specUrl":"https://www.ampproject.org/docs/reference/spec#links","category":"DISALLOWED_HTML","code":"INVALID_URL_PROTOCOL","params":["href","a","foodhttp"]}]}}

We can use head and tail to select a specific row:

$ head -n 2 errors.csv | tail -n 1
1,Bon Appetit,Prohibited or invalid use of HTML Tag (Critical issue),http://www.bonappetit.com/recipe/spicy-italian-sausage/amp,4/3/17
$ head -n 3 errors.csv | tail -n 1
2,Bon Appetit,Prohibited or invalid use of HTML Tag (Critical issue),http://www.bonappetit.com/entertaining-style/gift-guides/article/the-7-best-culinary-bookstores-in-america/amp,4/3/17

From each row we need to select the URL:

$ head -n 3 errors.csv | tail -n 1 | csvcut -c 4
http://www.bonappetit.com/entertaining-style/gift-guides/article/the-7-best-culinary-bookstores-in-america/amp

From there we will need to update the first index starting in 2 and ending in 5555. We can use seq for this:

➜  amp-valid seq 2 10
2
3
4
5
6
7
8
9
10
➜  amp-valid

And we are going to use parallel to speed up the process:

seq 2 5555 | parallel -j10 "head -n {} errors.csv | tail -n 1 | csvcut -c 4 | xargs node_modules/.bin/amphtml-validator $1 --format json | cat >> output/{}.json"

Now we have a bunch of json files inside ./output. We can merge them together with:

for f in *.json; do (cat "${f}"; echo) >> output.dat; done

Finally we need to remove ocasionally empty lines from out file:

cat output/output.dat | sed '/^\s*$/d' | cat >> clean.dat

Out new file clean.dat is ready to go!

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rafaelhbarros
Copy link

this is incredible, I'd suggest merging this doc in github.com/CondeNast/autopilot-services-validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment