Skip to content

Instantly share code, notes, and snippets.

@cosmocatalano
Last active August 6, 2023 07:32
Show Gist options
  • Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
Quick-and-dirty Instagram web scrape, just in case you don't think you should have to make your users log in to deliver them public photos.
<?php
//returns a big old hunk of JSON from a non-private IG account page.
function scrape_insta($username) {
$insta_source = file_get_contents('http://instagram.com/'.$username);
$shards = explode('window._sharedData = ', $insta_source);
$insta_json = explode(';</script>', $shards[1]);
$insta_array = json_decode($insta_json[0], TRUE);
return $insta_array;
}
//Supply a username
$my_account = 'cosmocatalano';
//Do the deed
$results_array = scrape_insta($my_account);
//An example of where to go from there
$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0];
echo 'Latest Photo:<br/>';
echo '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>';
echo 'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>';
/* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff.
echo 'Taken at '.$latest_array['location']['name'].'<br/>';
//Heck, lets compare it to a useful API, just for kicks.
echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">';
?>
*/
@MrHappyAsh
Copy link

@shoaibalich I suppose it depends how you are going to be using the data and storing it(if you are). I am using it for example just to display the number of followers an Instagram page has. This information is freely available on the internet anyway. It's no different to manually loading the page and making a note of the information.

Storing the information may be different though.

@NeeONCorp
Copy link

just add "?__a=1" at the end of url, it's already JSON
ex:
"http://instagram.com/username/?__a=1" for username
"http://instagram.com/tags/hashtag/?__a=1" for hashtag

Thank. This is the easier way 👍

@athreides
Copy link

Hi, the Wordpress theme that I'm using (Artmag) is using your solution in its Instagram widget. In this moment the widget is working properly with any Instagram account except mine: if I put my Instagram handle it gives a error to load resources and shows only posts texts (and not the most recent posts: they are posts that were posted a few months ago). What can be the problem?

@bigsee
Copy link

bigsee commented Mar 6, 2019

Thanks for the update @garudacrafts!

@tingli-shen
Copy link

any updates for getting images?

@ShaimyWinda
Copy link

Hello ! I have a problem with API Instagram i think. I dont have ProfilePage but LoginandSignupPage ...
Do you have a solution ?
Thanks

@transbetacism
Copy link

I think I have the same issue. I'm using this script in a WordPress theme and the Instagram feed has broken in several installations:

redirect

@bateller
Copy link

The referenced code still works (specifically the update provided by @garudacrafts on Mar 21, 2018).

The issue is Instagram over the last few weeks or so has restricted their unlogged-in (guest) access (most likely based on IP address).

After a large amount of queries to their servers, they will begin showing a "Please login" page (See below. Mine shows a "processing circle" where the login would be, because I have JavaScript turned off).

The only way around this would be to have each of your Users on Instagram who wish to use this process create an API Key (This is unrealistic, boo). Otherwise you'll need to use a proxy when issuing the request to Instagram so it doesn't see you hit their servers multiple times from the same IP address.

Picture below is what I see when going directly to a valid public user account after multiple requests from the same IP address (the actual number of requests that trigger this 'login' screen is currently unknown). Eg: https://instagram.com/username/
instagram

Using the same above referenced script or even postaddictme/instagram-php-scraper on a brand new IP address that hasn't hit Instagram's servers work just fine. However after multiple queries (once the IP is blacklisted), both the above referenced script and postaddictme/instagram-php-scraper begin to fail.

According to Instagram's documentation for their API they want you to have a API Key for every User who wishes to pull their photos (keeping the API key in sandbox mode). Again this seems unrealistic to me. You "can" submit your App on Instagram for review (which theoretically "may" let you pull photos for other Users from the same API key), but I highly doubt they'd approve an app that pulls images off their servers (like the above mentioned scripts do). I also do not specifically see this supported with their current API documentation. Nonetheless I have submitted my app for review. So I will let you know how that turns out.

@kissuidotnet
Copy link

it looks like you can hit instagramcom/{username}/?__a=1 and get the json...
file_get_contents doesnt work.. because it redirects you to the login page..

i think you got use curl and set the headers.. and a cookie? IDK...

Anyone know how to set up a simple CURL request that sets all of the relevant info?

@kissuidotnet
Copy link

This is what it looks like on google chrome to get a JSON response curl "https://www.instagram.com/username/?__a=1" -H "authority: www.instagram.com" -H "upgrade-insecure-requests: 1" -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36" -H "sec-fetch-mode: navigate" -H "sec-fetch-user: ?1" -H "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3" -H "sec-fetch-site: none" -H "accept-encoding: gzip, deflate, br" -H "accept-language: en-US,en;q=0.9,pl;q=0.8,it;q=0.7" --compressed

Anyone how to easily get this done using PHPcurl.

@cosmocatalano
Copy link
Author

https://incarnate.github.io/curl-to-php/ could probably help with that

@kissuidotnet
Copy link

Does anyone know if we can set a cookie with filegetcontents

Also does anyone know how to work with csrftoken? It looks like instagram needs that to be set..

@knaven0128
Copy link

Is this updated?

@chrismccoy
Copy link

Is this updated?

change

$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0];

to

$latest_array = $results_array['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']['edges'];

@SwetankPoddar
Copy link

I created a small program which extracts data from this endpoint and creates a gallery kinda thing. It is in JavaScript, but all the arrays and stuff are same so maybe it would be useful for some people trying to migrate to the new changes :)

https://github.com/SwetankPoddar/dynamic-instagram-gallery

@han-s-kl
Copy link

does this still work august 2020? Seems all my serverside options have been blocked (curl & filegetcontents). Only javascript options (axios) are working. But i want a cronjob scraper so i need serverside solution.

@chrismccoy
Copy link

does this still work august 2020? Seems all my serverside options have been blocked (curl & filegetcontents). Only javascript options (axios) are working. But i want a cronjob scraper so i need serverside solution.

the fix i posted above worked a couple weeks ago.

@messinismarios
Copy link

Doesn't seem to be working

@mrkhyns
Copy link

mrkhyns commented Sep 28, 2020

As of Sep 2020, the following revisions must be made:

$latest_array = $results_array['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']['edges'][0]['node'];
echo 'Latest Photo:<br/>';
echo '<a href="http://instagram.com/p/'.$latest_array['shortcode'].'"><img src="'.$latest_array['thumbnail_src'].'"></a></br>';
echo 'Likes: '.$latest_array['edge_media_preview_like']['count'].' - Comments: '.$latest_array['edge_media_to_comment']['count'].'<br/>';

@LabN36
Copy link

LabN36 commented Oct 6, 2020

The referenced code still works (specifically the update provided by @garudacrafts on Mar 21, 2018).

The issue is Instagram over the last few weeks or so has restricted their unlogged-in (guest) access (most likely based on IP address).

After a large amount of queries to their servers, they will begin showing a "Please login" page (See below. Mine shows a "processing circle" where the login would be, because I have JavaScript turned off).

The only way around this would be to have each of your Users on Instagram who wish to use this process create an API Key (This is unrealistic, boo). Otherwise you'll need to use a proxy when issuing the request to Instagram so it doesn't see you hit their servers multiple times from the same IP address.

Picture below is what I see when going directly to a valid public user account after multiple requests from the same IP address (the actual number of requests that trigger this 'login' screen is currently unknown). Eg: https://instagram.com/username/
instagram

Using the same above referenced script or even postaddictme/instagram-php-scraper on a brand new IP address that hasn't hit Instagram's servers work just fine. However after multiple queries (once the IP is blacklisted), both the above referenced script and postaddictme/instagram-php-scraper begin to fail.

According to Instagram's documentation for their API they want you to have a API Key for every User who wishes to pull their photos (keeping the API key in sandbox mode). Again this seems unrealistic to me. You "can" submit your App on Instagram for review (which theoretically "may" let you pull photos for other Users from the same API key), but I highly doubt they'd approve an app that pulls images off their servers (like the above mentioned scripts do). I also do not specifically see this supported with their current API documentation. Nonetheless I have submitted my app for review. So I will let you know how that turns out.

were you able to find any good solution to this issue ? what is the best way we can bypass this login page ? @bateller

@restyler
Copy link

restyler commented Oct 8, 2020

According to Instagram's documentation for their API they want you to have a API Key for every User who wishes to pull their photos (keeping the API key in sandbox mode). Again this seems unrealistic to me. You "can" submit your App on Instagram for review (which theoretically "may" let you pull photos for other Users from the same API key), but I highly doubt they'd approve an app that pulls images off their servers (like the above mentioned scripts do). I also do not specifically see this supported with their current API documentation. Nonetheless I have submitted my app for review. So I will let you know how that turns out.

were you able to find any good solution to this issue ? what is the best way we can bypass this login page ? @bateller

I've hit the same issue and had to spend fair amount of time on it for my own project.
Here is the code I came up with: https://github.com/restyler/instagram-php-scraper - it uses Rapid API ( https://rapidapi.com/restyler/api/instagram40 ) to bypass ip restrictions.

@LabN36
Copy link

LabN36 commented Oct 8, 2020

@restyler are you fetching the user's post details ie. suppose if i provide you a instagram post link does it return the the path where it's stored ? instagram ususally detect the datacenter IP.
i can see you've a method getMediaByUrl but I'm not sure how you're dealing with the IP, please let me know. Thanks

@restyler
Copy link

restyler commented Oct 12, 2020

@restyler are you fetching the user's post details ie. suppose if i provide you a instagram post link does it return the the path where it's stored ? instagram ususally detect the datacenter IP.
i can see you've a method getMediaByUrl but I'm not sure how you're dealing with the IP, please let me know. Thanks

Yes. Technically there is a proxy method in the API which allows you to submit any instagram.com* link and get raw HTML/JSON response, and there are helper endpoints like getMediaByUrl you've mentioned, if you don't need raw response. I'd recommend use helpers when it is feasible, because this approach uses more optimisations on the API side.

To mitigate Instagram ip detection (on the API side) I use proxies which are usually not located in popular data center ip ranges.

@LabN36
Copy link

LabN36 commented Oct 12, 2020

To mitigate Instagram ip detection (on the API side) I use proxies which are usually not located in popular data center ip ranges.

@restyler thanks for replying really appreciated, can you tell me a little more about your login on how you are handling from not getting blocked by instagram, are you using any third party API or anything which provides new IP on each request ? because by looking your code it seems like you're just asking proxy credentials from user and connecting to that proxy server if i'm not wrong. please let me know your comments. Thanks.

@ycaty
Copy link

ycaty commented Oct 30, 2020

hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020

By any chance does anyone happen to have a way to collect followers without logging in?

@rramoscabral
Copy link

hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/levlet/23cf1c17e6bb6e353f5823b3392c1e01

By any chance does anyone happen to have a way to collect followers without logging in?

Page not found

@ycaty
Copy link

ycaty commented Feb 2, 2021

hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/levlet/23cf1c17e6bb6e353f5823b3392c1e01
By any chance does anyone happen to have a way to collect followers without logging in?

Page not found

updated link
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020

@Yashwanthd1998
Copy link

looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?

Copy link

ghost commented Aug 6, 2021

Hi 'Cosmocatalano' [ nomen est omen?] :) ,
this is a very interesting solution. I only try it on local host so I have no problem with CORS. But the array names seem to be changed completely. The only one which is still the same seems to be 'entry_data'. Is this changed response still usable with alternative array 'names'? This would be very interesting.

Best regards and thanks
Axel Arnold Bangert

@skmachine
Copy link

looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?

I guess it is just the right amount of good proxies.. I am using https://rapidapi.com/neotank/api/instagram130 /proxy method to avoid dealing with proxies now because they fail all the time (for Instagram) and get 302 redirect to login..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment