- gained familiarity with the Python re and json libraries.
- gained experiencing working with json data structures.
- gained more understanding of how website urls are structured and how websites systematically store resources.
The in-house python web scraper (which I called
web_scraper) typically took ~10 configuration values which were xpaths of various pieces of information we wanted from websites in target languages. Websites that loaded resources through AJAX were more difficult to collect using the traditional method, and so for certain configuration values, we needed to write a custom file for the web scraper to use.