Skip to content

Instantly share code, notes, and snippets.

@clementlefevre
Created October 14, 2021 18:36
Show Gist options
  • Save clementlefevre/211c28711cf0c8ea7d1cb2560a69a38e to your computer and use it in GitHub Desktop.
Save clementlefevre/211c28711cf0c8ea7d1cb2560a69a38e to your computer and use it in GitHub Desktop.
import requests
import pandas as pd
from lxml import etree
r = requests.get("https://en.wikipedia.org/wiki/List_of_photovoltaic_power_stations")
main_page = etree.HTML(r.content)
list_title = main_page.xpath('.//table[contains(@class,"wikitable")]/tbody/tr/td/a/text()')
list_geo = main_page.xpath('.//table[contains(@class,"wikitable")]/tbody/tr//span[contains(@class,"geo-dec")]/text()')
zipped_list = list(zip(list_title,list_geo))
df = pd.DataFrame(zipped_list)
df.columns = ["name", "geo_raw"]
df[["latitude","longitude"]]=df.geo_raw.str.split(' ', 1, expand=True)
df.latitude = df.latitude.str.extract('(^\d*.\d*)')
df.longitude = df.longitude.str.extract('(^\d*.\d*)')
df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment