Skip to content

Instantly share code, notes, and snippets.

@myles
Last active September 15, 2023 23:02
Show Gist options
  • Save myles/a8ed889f4de1cde6be8be2640b34c89e to your computer and use it in GitHub Desktop.
Save myles/a8ed889f4de1cde6be8be2640b34c89e to your computer and use it in GitHub Desktop.
Colly & Go vs. BeautifulSoup & Python

Colly & Go vs. BeautifulSoup & Python

> time go run cryptocoinmarketcap.go
2017/12/19 17:26:38 Scraping finished, check file "cryptocoinmarketcap-go.csv" for results
        2.24 real         0.84 user         0.45 sys

> time python3 cryptocoinmarketcap.py
WARNING:root:Scraping finished, check file cryptocoinmarketcap-py.csv for results
        3.51 real         2.94 user         0.07 sys
package main
import (
"encoding/csv"
"log"
"os"
"github.com/gocolly/colly"
)
func main() {
fName := "cryptocoinmarketcap-go.csv"
file, err := os.Create(fName)
if err != nil {
log.Fatalf("Cannot create file %q: %s\n", fName, err)
return
}
defer file.Close()
writer := csv.NewWriter(file)
defer writer.Flush()
// Write CSV header
writer.Write([]string{"Name", "Symbol", "Price (USD)", "Volume (USD)",
"Market capacity (USD)", "Change (1h)",
"Change (24h)", "Change (7d)"})
// Instantiate default collector
c := colly.NewCollector()
c.OnHTML("#currencies-all tbody tr", func(e *colly.HTMLElement) {
writer.Write([]string{
e.ChildText(".currency-name-container"),
e.ChildText(".col-symbol"),
e.ChildAttr("a.price", "data-usd"),
e.ChildAttr("a.volume", "data-usd"),
e.ChildAttr(".market-cap", "data-usd"),
e.ChildText(".percent-1h"),
e.ChildText(".percent-24h"),
e.ChildText(".percent-7d"),
})
})
c.Visit("https://coinmarketcap.com/all/views/all/")
log.Printf("Scraping finished, check file %q for results\n", fName)
}
import csv
import logging
import requests
from bs4 import BeautifulSoup
def main():
file_name = 'cryptocoinmarketcap-py.csv'
data = [['Name', 'Symbol', 'Price (USD)', 'Volume (USD)',
'Market capacity (USD)', 'Change (1h)', 'Change (24h)',
'Change (7d)']]
resp = requests.get('https://coinmarketcap.com/all/views/all/')
soup = BeautifulSoup(resp.content, 'html.parser')
for r in soup.select('#currencies-all tbody tr'):
row = [
r.select_one('.currency-name-container').string,
r.select_one('.col-symbol').string,
r.select_one('a.price').get('data-usd'),
r.select_one('a.volume').get('data-usd'),
r.select_one('.market-cap').get('data-usd'),
]
if r.select_one('.percent-1h'):
row.append(r.select_one('.percent-1h').string)
if r.select_one('.percent-24h'):
row.append(r.select_one('.percent-24h').string)
if r.select_one('.percent-7d'):
row.append(r.select_one('.percent-7d').string)
data.append(row)
with open(file_name, 'w', newline='') as fobj:
writer = csv.writer(fobj)
writer.writerows(data)
logging.warning(('Scraping finished, check file {} for '
'results').format(file_name))
if __name__ == '__main__':
main()
@myles
Copy link
Author

myles commented Sep 14, 2020

You are technically right, but build web scrapers is never a one time run and done task. You usually need to run see if the output is expected, see that the HTML is actually different for this one <td> element, fix the code, re-run the script, find more errors and repeat. I don't know but having to compile it to binary and execute it would probably give a slower result.

@myles
Copy link
Author

myles commented Sep 15, 2020

Fork it and update then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment