Skip to content

Instantly share code, notes, and snippets.

@jseabold
Last active June 14, 2018 03:02
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jseabold/5892603 to your computer and use it in GitHub Desktop.
Save jseabold/5892603 to your computer and use it in GitHub Desktop.
Read an HTML table using pandas
# you can use something like this if read_html fails to find a table
# if you have bs4 >= 4.2.1, you can skip the lxml stuff, the tables
# are scraped automatically. 4.2.0 won't work.
import pandas as pd
from lxml import html
url = "http://www.uesp.net/wiki/Skyrim:No_Stone_Unturned"
xpath = "//*[@id=\"mw-content-text\"]/table[3]"
tree = html.parse(url)
table = tree.xpath(xpath)[0]
raw_html = html.tostring(table)
dta = pd.read_html(raw_html, header=0)[0]
dta["completed"] = 0
del dta["Map"]
table.make_links_absolute()
dta["map_link"] = [i[1][0].get('href') for i in table[1:]]
@cpcloud
Copy link

cpcloud commented Jul 13, 2013

@jseabold can u raise an issue if there's a problem with match? thanks 😄

@jseabold
Copy link
Author

I suspect it's related to this. I have bs4 4.2.0, if you want to test that read_html(url) works with another version installed.

pandas-dev/pandas#4214

@cpcloud
Copy link

cpcloud commented Jul 14, 2013

here's my dta

In [22]: dta
Out[22]:
header                         Location       Map                                              Notes
1                              Whiterun       map  Hall of the Dead (Whiterun). Inside the Cataco...
2                             Whiterun†       map                  Dragonsreach, the Jarl's bedroom.
3                             Whiterun†       map  Jorrvaskr's living quarters, on the bookshelf ...
4                             Solitude†       map  Blue Palace, Jarl's bedroom, Right side of the...
5                              Solitude  map/ map  Proudspire Manor. Buy the house. The stone is ...
6                               Riften†       map  Mistveil Keep, Jarl's bedroom, top of the stai...
7                             Windhelm†       map  Palace of the Kings Upstairs. Take the first d...
8                             Windhelm†       map  House of Clan Shatter-Shield, on a bookshelf i...
9                             Markarth†       map  The Treasury House, on a bedside table in Thon...
10                            Markarth†       map  Understone Keep's Dwemer Museum, on a table in...
11               College of Winterhold†       map  Arch-Mage's Quarters. Access may be gained at ...
12                      Dead Crone Rock       map    On the stone altar opposite a Dragon Word Wall.
13                    Black-Briar Lodge       map  East of Riften. The gem is in the top floor be...
14                            Ansilvund       map  Northeast of Shor's Stone and due north of Rif...
15                     Stony Creek Cave       map  Just north of Ansilvund, on the table next to ...
16                      Rannveig's Fast       map  After falling down the trapdoor in front of th...
17                        Fellglow Keep       map  Northeast of Whiterun. Head straight up the st...
18                         Dainty Sload       map  Ship along the coast northeast of Solitude, do...
19                    Sunderstone Gorge       map  On the table with two bodies in front of the D...
20                              Yngvild       map  The island northeast of Dawnstar. The gem is i...
21                      Hob's Fall Cave       map  On the northern coast, directly between Dawnst...
22      Reeking Cave or Thalmor Embassy  map/ map  Prior to patch 1.4, the stone is located in th...
23         Pinewatch Bandit's Sanctuary       map  In a house west of Helgen, there is a secret h...
24           Dark Brotherhood Sanctuary       map  On the dresser in Astrid's room. It is still a...

@cpcloud
Copy link

cpcloud commented Jul 14, 2013

i changed the column index name

@jseabold
Copy link
Author

Yep works fine with bs4 4.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment