from bs4 import BeautifulSoup
def read_excel_xml(path):
file = open(path).read()
soup = BeautifulSoup(file,'xml')
workbook = []
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
row_as_list = []
for cell in row.findAll('Cell'):
row_as_list.append(cell.Data.text)
sheet_as_list.append(row_as_list)
workbook.append(sheet_as_list)
return workbook
-
-
Save hellpanderrr/f5bc0f70ccb637d6fe78 to your computer and use it in GitHub Desktop.
I'd be very careful with this implementation - it assumes you have no empty cells in your table . In reality if you have empty cells -> the cell will have a "ss:Index" attribute that you have to check -> this is what allows Excel to store large sparse tables in a compact form. This will throw the code above off. For example, in one row you will have below cells:
<Cell><Data ss:Type="String">TEST</Data></Cell> <Cell ss:Index="20"><Data ss:Type="Number">1</Data></Cell>
This means that the second cell is at column 20 (1's based indexing in stupid Excel!) . The first cell (the string "Test" ) is in the first column. So you will have 18 empty cells in between which are not going to be in the XML. To solve this:
check if ss:Index is present in the cell ---->
if ( type(cell['ss:Index']) == type(None) ) : (do work)
or alternatively
if ( cell['ss:Index'] == None ) : (do work)
очень неплохо, жаль что медленно