Last active
August 6, 2023 07:32
-
-
Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
Quick-and-dirty Instagram web scrape, just in case you don't think you should have to make your users log in to deliver them public photos.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
//returns a big old hunk of JSON from a non-private IG account page. | |
function scrape_insta($username) { | |
$insta_source = file_get_contents('http://instagram.com/'.$username); | |
$shards = explode('window._sharedData = ', $insta_source); | |
$insta_json = explode(';</script>', $shards[1]); | |
$insta_array = json_decode($insta_json[0], TRUE); | |
return $insta_array; | |
} | |
//Supply a username | |
$my_account = 'cosmocatalano'; | |
//Do the deed | |
$results_array = scrape_insta($my_account); | |
//An example of where to go from there | |
$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0]; | |
echo 'Latest Photo:<br/>'; | |
echo '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>'; | |
echo 'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>'; | |
/* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff. | |
echo 'Taken at '.$latest_array['location']['name'].'<br/>'; | |
//Heck, lets compare it to a useful API, just for kicks. | |
echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">'; | |
?> | |
*/ |
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
I guess it is just the right amount of good proxies.. I am using https://rapidapi.com/neotank/api/instagram130 /proxy
method to avoid dealing with proxies now because they fail all the time (for Instagram) and get 302 redirect to login..
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi 'Cosmocatalano' [ nomen est omen?] :) ,
this is a very interesting solution. I only try it on local host so I have no problem with CORS. But the array names seem to be changed completely. The only one which is still the same seems to be 'entry_data'. Is this changed response still usable with alternative array 'names'? This would be very interesting.
Best regards and thanks
Axel Arnold Bangert