Skip to content

Instantly share code, notes, and snippets.

@1RedOne
Last active December 30, 2019 16:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1RedOne/813e342485e3777768289cc9e6d59f9e to your computer and use it in GitHub Desktop.
Save 1RedOne/813e342485e3777768289cc9e6d59f9e to your computer and use it in GitHub Desktop.
Extracting content from an API from a site

Extracting Content from a site's unpublished API

Imagine that youve found a site that has a perfect list of some info you need, but the site owner's don't have it in a format you can easily use! This happens a lot, but fortunatley for us, if the data can be retrieved and displayed in a web browser, we can normally request that same data directly through a web call instead!

The problem is that the API endpoints we need to hit may not always be published publically.

For instance, this webpage has a lot of good info on beer, but no great way to export it.

https://www.systembolaget.se/sok-dryck/?subcategory=%C3%96l&type=Ale%20brittisk-amerikansk%20stil&style=Imperial%2FDubbel%20IPA&fullassortment=1

I'll walk you through a technique to find the API used and grab the data ourselves!

Start by opening Chrome and then open up the DevTools and navigate to the Network pane. Now, navigate to the URL.

Next, click 'I am over 20' and watch for requests of the type 'xhr'. An XHR request is an AJAX request (Asynchronous JavaScript and XML (but it could be called AJAJ, because everything is in JSON now, pretty much!))

Example of an XHR Request in Chrome Tools

extract01

AJAX requests are commonly used when you want to load a form quickly and then retrieve the data from an API to fill out a table or form. This is a lot faster for user experience than holding up the whole page load until you've sent the full data payload over.

So, we watch for XHR requests because they're basically always interesting!

In this case, it loads their catalog of beer!

Easily Viewing the body of the response

You can just click the request to see info about it, and on the Response tab, you can see the payload. This is what we were looking for! extract02

If the JSON is really complex, it might be hard to read in chrome, so I recommend copying it and pasting into Jsonlint.com to format it. You can even take the URL for the XHR request (in the red box above) and paste it into JSONLint to get a pretty print version of the JSON object.

extract03

Now that we know this URL has the info that we're looking for, we can just paste this directly into Invoke-RestMethod and then look at the output until we find the values we want!

$t = Invoke-RestMethod -Uri "https://www.systembolaget.se/api/productsearch/search/sok-dryck/?style=Imperial%2FDubbel%20IPA&subcategory=%C3%96l&type=Ale%20brittisk-amerikansk%20stil&sortdirection=Ascending&site=all&fullassortment=1"
$t.ProductSearchResults | select ProducerName, ProductNameBold, ProductNameThin
@1RedOne
Copy link
Author

1RedOne commented Jan 12, 2019

Example of an XHR Request in Chrome Tools

extract01

@1RedOne
Copy link
Author

1RedOne commented Jan 12, 2019

Example of viewing the body of the response

You can just click the request to see info about it, and on the Response tab, you can see the payload. This is what we were looking for!
extract02

@1RedOne
Copy link
Author

1RedOne commented Jan 12, 2019

Making it easier to read

You can make things easier to read by pasting the response (or even the URL itself) into a linting site, like the awesome JSONLint.com

extract03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment