Skip to content

Instantly share code, notes, and snippets.

@yurukov
Last active November 19, 2021 10:12
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save yurukov/8326b3803b436c100cac to your computer and use it in GitHub Desktop.
Save yurukov/8326b3803b436c100cac to your computer and use it in GitHub Desktop.
Scraping a full Facebook group page from a browser
These are a few commands that could be used to scrape a full group page
from Facebook. One can use the Graph API, but there some users would be
hidden. The JS commands should be run in a browser and scroll through
the page opening up hidden content and comments. I used Chrome. Once
enough content is opened, you should save the page as any other and
analyse it's contents.
// 1. load the group
// 2. start scrolling. This will erase all images to minimize the size
// of the page in memory and keep scrolling down
scroll = setInterval(function() {
a = $$("img"); for (i=0;i<a.length;i++) a[i].parentNode.removeChild(a[i]);
window.scrollTo(0,document.body.scrollHeight);
},3000);
// 3. Stop scrolling when satisfied
clearInterval(scroll);
// 4. Add a guard against reloading the page
window.onbeforeunload = function() {
clearInterval(uncover);
return "Loading hidden comments stopped.";
}
// 5. Load hidden comments and posts. Loading some posts may reload the
// page. In these cases the guard above will stop the loading process and
// stop the reload. In that case, press cancel and run this command again
uncover = setInterval(function() {
a = $$("img"); for (i=0;i<a.length;i++) a[i].parentNode.removeChild(a[i]);
a = $$("a[class='see_more_link']");
if (a.length>0) {
a[0].target="_blank";
a[0].click();
a[0].className="see_more_link passed";
}
b = $$("a[class='UFIPagerLink']");
if (b.length>0) {
b[0].click();
b[0].className="UFIPagerLink passed";
}
console.log(a.length+" "+b.length);
},1000);
// 6. When all is loaded, stop the comment/post recover process
clearInterval(uncover);
// 7. Save the page code from the browser
@postullat
Copy link

postullat commented Feb 4, 2020

Hello,

I use my own Facebook group scrape tool (written in Java) which runs each Z minutes and scrape last N post in the specified group.

For my needs, the posts are storing in the Firebase

I am going to publish this tool for the world - just need to prepare some executable files

The program will be available here - http://bit.ly/3bbtJA0

Meanwhile, you may request an output format you need to be added in the tool and/or some extra logic

Here are my contacts

email - postullat2@gmail.com

telegram - https://t.me/postullat

Skype - postullat2@gmail.com

Feel free to contact me

Best regards,
Vova

@oboote
Copy link

oboote commented Nov 19, 2021

$$ wasn't working for me whilst inside setInterval (but worked fine when run manually) due to scoping; binding it to querySelectorAll on the document manually before trying to run setInterval fixed it.

const $$ = document.querySelectorAll.bind(document);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment