Skip to content

Instantly share code, notes, and snippets.

@Ammar-Azman
Last active August 16, 2023 02:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Ammar-Azman/1e67cffb929a01f9c97c1ed769243ff7 to your computer and use it in GitHub Desktop.
Save Ammar-Azman/1e67cffb929a01f9c97c1ed769243ff7 to your computer and use it in GitHub Desktop.
Scraping issue: Hidden Data? Meh.

Scraping issue: Hidden Data? Meh.

  • 15/8/2023

Details

On 15/8/2023, I have task to scrap a website. With simplistic mind, I thought I can just used BeautifulSoup and requests to get the information. However, the website that I intend to scrap has shown no text at all from the output of the code. I thought the website was purposely created to hide the information. After asking some colleague and read on internet, this is due to the JS framework that been used by the webdeveloper which is Tailwind and Bootstrap.

This is a type of CSS framework that will used a long damn CSS selector name on the website. And after asking some (more) friends about this issue, it is a common problem of scraping this type of website which is called as the dyanmic website (data is fetched from somewhere ie database, and not directly written on HTML like its opposite type of website which is static website).

To make the scraping succeed for dynamic website, we can use Selenium. After coded some line, I managed to scraped the website. By using Selenium, instead of using tag (like we use on Bs4), we need to use XPath. From the XPath we can get the text and do the processing of string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment