Skip to content

Instantly share code, notes, and snippets.

@cosmocatalano
Last active August 6, 2023 07:32
Show Gist options
  • Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
Quick-and-dirty Instagram web scrape, just in case you don't think you should have to make your users log in to deliver them public photos.
<?php
//returns a big old hunk of JSON from a non-private IG account page.
function scrape_insta($username) {
$insta_source = file_get_contents('http://instagram.com/'.$username);
$shards = explode('window._sharedData = ', $insta_source);
$insta_json = explode(';</script>', $shards[1]);
$insta_array = json_decode($insta_json[0], TRUE);
return $insta_array;
}
//Supply a username
$my_account = 'cosmocatalano';
//Do the deed
$results_array = scrape_insta($my_account);
//An example of where to go from there
$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0];
echo 'Latest Photo:<br/>';
echo '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>';
echo 'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>';
/* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff.
echo 'Taken at '.$latest_array['location']['name'].'<br/>';
//Heck, lets compare it to a useful API, just for kicks.
echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">';
?>
*/
@Yashwanthd1998
Copy link

looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?

Copy link

ghost commented Aug 6, 2021

Hi 'Cosmocatalano' [ nomen est omen?] :) ,
this is a very interesting solution. I only try it on local host so I have no problem with CORS. But the array names seem to be changed completely. The only one which is still the same seems to be 'entry_data'. Is this changed response still usable with alternative array 'names'? This would be very interesting.

Best regards and thanks
Axel Arnold Bangert

@skmachine
Copy link

looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?

I guess it is just the right amount of good proxies.. I am using https://rapidapi.com/neotank/api/instagram130 /proxy method to avoid dealing with proxies now because they fail all the time (for Instagram) and get 302 redirect to login..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment