This guide will introduce you to basic web scraping. In the words of Wikipedia, scraping is 'the process of automatically collecting information from the World Wide Web'. It is a useful technique for gathering information from a web page (especially when there are no available APIs or datasets with that information).
We will be using two major tools in this guide:
- HTTParty, our old friend in the ways of getting web page data.
- Nokogiri, an XML/HTML parser, which will help us find the page information we need.
I will demonstrate the process by scraping information about houses in Game of Thrones from a wiki--just as I did to prepare the Houses of Westeros dataset.