Scraping issue: Hidden Data? Meh.
- 15/8/2023
On 15/8/2023, I have task to scrap a website. With simplistic mind, I thought I can just used BeautifulSoup
and requests
to get the information.
However, the website that I intend to scrap has shown no text at all from the output of the code. I thought the website
was purposely created to hide the information. After asking some colleague and read on internet, this is due to the JS framework
that been used by the webdeveloper which is Tailwind and Bootstrap.
This is a type of CSS framework that will used a long damn CSS selector name on the website. And after asking some (more) friends about this issue, it is a common problem of scraping this type of website which is called as the dyanmic website (data is fetched from somewhere ie database, and not directly written on HTML like its opposite type of website which is static website).
To make the scraping succeed for dynamic website, we can use Selenium
. After coded some line, I managed to scraped the website.
By using Selenium
, instead of using tag
(like we use on Bs4), we need to use XPath
. From the XPath
we can get the text and
do the processing of string.