bretton/forex-scrape.md

## forex-scrape.md

      
    Raw
  

              forex-scrape.md
            
          
    Scraping Forex Data

I was looking for ways to scrape forex data after fixer.io switched to a subscription model with rate-limiting, making their API useless for lots of queries.
1. Use an alternative API

One alternative API which is still free is https://exchangeratesapi.io/
It's basically a drop-in replacement for fixer.io and can be used to get all forex rates in json format at

https://exchangeratesapi.io/api/latest

To set your base currency just use an addition to the URL as follows

https://exchangeratesapi.io/api/latest?base=ZAR

See the website documentation for more detail.
2. Write your own one-liner to create the json

I discovered the various forex APIs are all just calling info from the European Central Bank's published data, which is updated once a day.

http://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/index.en.html
XML file: http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml

However the format is in XML, which isn't very useful for my scripts which are setup to use JSON.
So we need to modify the XML into JSON format. It turns out to be simple enough with a one-liner in bash.
This assumes you have installed yq, a companion to jq which also includes xq which is for XML files. If you like searching json with jq you'll love searching YAML/XML with yq and xq.

https://github.com/kislyuk/yq
https://yq.readthedocs.io/en/latest/

If you have jq installed, and python-pip, then installing is as simple as:
pip install yq

Next, the following one-liner bash script will scrape ECB forex rates (Euro-base) and output into a JSON file you can query with jq in a second process:
curl -s http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml |xq '.[]|.["Cube"]["Cube"]["Cube"] | .[] |map(values)' |sed -e "s/,/:/" |tr '\n' ' ' |tr '[' '{' |tr ']' '}' |sed -e "s/ } { /, /g" |jq -c -r '.'

which produces output as follows
{"USD":"1.1648","JPY":"128.30","BGN":"1.9558","CZK":"25.792","DKK":"7.4526","GBP":"0.87663","HUF":"324.05","PLN":"4.3148","RON":"4.6663","SEK":"10.2950","CHF":"1.1534","ISK":"126.60","NOK":"9.4328","HRK":"7.3788","RUB":"73.5314","TRY":"5.4639","AUD":"1.5681","BRL":"4.3617","CAD":"1.5459","CNY":"7.5689","HKD":"9.1393","IDR":"16429.21","ILS":"4.2134","INR":"79.0300","KRW":"1293.76","MXN":"23.5466","MYR":"4.6621","NZD":"1.6880","PHP":"62.079","SGD":"1.5822","THB":"38.369","ZAR":"15.7165"}

You can pipe it to a file, and perform additional operations on that file, or even narrow it down to specific currencies as follows:
curl -s http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml |xq '.[]|.["Cube"]["Cube"]["Cube"] | .[] |map(values)' |sed -e "s/,/:/" |tr '\n' ' ' |tr '[' '{' |tr ']' '}' |sed -e "s/ } { /, /g" |jq -c -r '.ZAR'

It shouldn't be hard to append date info either.
i.e. basically what fixer.io is doing and charging you for, or exchangeratesapi.io is doing

3. Scrape Standard Bank's feed

It was brought to my attention that a local bank publishes forex info at the following URL

http://ws15.standardbank.co.za/finSnapShot/GetforexServlet

But this isn't in a very useful format for scripts.
I've previously built a python script to scrape html tables into a csv file, and it works just as well here. Please see the stackoverflow.com link for more info on prerequisites to get it working.
#!/usr/bin/python
#
# 2017-12-05  Bretton  initial creation
# this file copied from https://stackoverflow.com/questions/259091/how-can-i-scrape-an-html-table-to-csv
# usage:
#
# curl URL | /full/path/to/htmltabletocsv.py
# 
# and html table will be output as csv file
#

from bs4 import BeautifulSoup
import sys
import re
import csv

def cell_text(cell):
    return " ".join(cell.stripped_strings)

soup = BeautifulSoup(sys.stdin.read(), "html.parser")
output = csv.writer(sys.stdout)

for table in soup.find_all('table'):
    for row in table.find_all('tr'):
        col = map(cell_text, row.find_all(re.compile('t[dh]')))
        output.writerow(col)
    output.writerow([])

Now we have a URL to scrape:

http://ws15.standardbank.co.za/finSnapShot/GetforexServlet

and a script to turn the tables into CSV:

htmltabletocsv.py

But unfortunately the output isn't very pretty, so we'll have to do some more bash operations to get the output we want.
There are multiple ways of tackling this, and picking which column you want to apply, which I'll leave to you.
However the following work for me:
curl -s "http://ws15.standardbank.co.za/finSnapShot/GetforexServlet" | ./htmltabletocsv.py |grep USD| awk -F, '{print $5}'

produces
13.2838

while
curl -s "http://ws15.standardbank.co.za/finSnapShot/GetforexServlet" | ./htmltabletocsv.py |grep USD| awk -F, '{print $5, $7}' |tr ' ' '\n'

produces buy/sell values, which can be used to calculate a spread or in/out flow of your calculations.
13.2838
13.6088