Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
try:
import openpyxl
def XLSXDictReader(f):
book = openpyxl.reader.excel.load_workbook(f)
sheet = book.get_active_sheet()
rows = sheet.get_highest_row()
cols = sheet.get_highest_column()
headers = dict( (i, sheet.cell(row=0, column=i).value) \
for i in range(cols) )
def item(i, j):
return (sheet.cell(row=0, column=j).value,
sheet.cell(row=i, column=j).value)
return (dict(item(i,j) for j in range(cols)) for i in range(1, rows))
except ImportError:
XLSXDictReader = None
@eldang

This comment has been minimized.

Copy link

eldang commented Nov 11, 2016

Thank you for sharing this. I ended up having to make some tweaks--I'm guessing openpyxl has changed in the years since--but having this structure to start with saved me a significant amount of time.

I had to switch some things to 1-indexing, manually replace the get_highest_row() and get_highest_column() methods which don't seem to exist any more, and explicitly cast output to strings to make it a perfect drop-in replacement for csv.DictReader(). Here's what I came up with:

def XLSXDictReader(f):
  book  = openpyxl.reader.excel.load_workbook(f)
  sheet = book.get_active_sheet()

  rows = 1
  for row in sheet.iter_rows():
    rows = rows + 1
  cols = 1
  for col in sheet.iter_cols():
    cols = cols + 1

  headers = dict( (i, sheet.cell(row=1, column=i).value) for i in range(1, cols) )

  def item(i, j):
    if sheet.cell(row=i, column=j).value == None:
      return (sheet.cell(row=1, column=j).value, '')
    else:
      return (sheet.cell(row=1, column=j).value, str(sheet.cell(row=i, column=j).value))

  return (dict(item(i,j) for j in range(1, cols)) for i in range(2, rows))

Thanks again - your starting point was a really big help.

@erinkeith

This comment has been minimized.

Copy link

erinkeith commented Mar 22, 2017

@eldang, openpyxl does have attributes for the work sheet to get you the max_row and max_columns:

rows = sheet.max_row
cols = sheet.max_column

@adejones

This comment has been minimized.

Copy link

adejones commented Jan 8, 2018

This is what works for me on openpyxl 2.4.8 using sheet.max_row and sheet.max_column

def XLSXDictReader(f):
    book = openpyxl.reader.excel.load_workbook(f)
    sheet = book.get_active_sheet()
    rows = sheet.max_row
    cols = sheet.max_column
    headers = dict((i, sheet.cell(row=1, column=i).value) for i in range(1, cols))
    def item(i, j):
        return (sheet.cell(row=1, column=j).value, sheet.cell(row=i, column=j).value)
    return (dict(item(i, j) for j in range(1, cols + 1)) for i in range(2, rows + 1))

Thanks for doing the hard work!

@eldang

This comment has been minimized.

Copy link

eldang commented Mar 8, 2019

@erinkeith @adejones thank you both for continuing to improve on this! (and @erinkeith I'm sorry I didn't see the notification when you commented).

I'm glad to take a few lines of code out with the switch to .max_row and .max_column. The current openpyxl documentation says that get_active_sheet() is deprecated and should be replaced with just .active, which is working for me. I appreciate the simplicity & conciseness of @adejones's version, but for the files I work with I often run into problems because of empty cells being returned as None rather than the empty strings that Python's built-in CSV support treats them as. And I sometimes need to be able to specify which worksheet to use, so here's my latest version in case it helps anyone:

def XLSXDictReader(fileName, sheetName=None):
  book = openpyxl.reader.excel.load_workbook(fileName)
  # if there's no sheet name specified, try to get the active sheet.  This will work reliably for workbooks with only one sheet; unpredictably if there are multiple worksheets present.
  if sheetName is None:
    sheet = book.active
  elif sheetName not in book.sheetnames:
    print(sheetName, "not found in", fileName)
    exit()
  else:
    sheet = book[sheetName]

  rows = sheet.max_row + 1
  cols = sheet.max_column + 1

  def cleanValue(s):
    if s == None:
      return ''
    else:
      return str(s).strip()

  def item(i, j):
    return (
      cleanValue(sheet.cell(row=1, column=j).value),
      cleanValue(sheet.cell(row=i, column=j).value)
    )

  return (dict(item(i,j) for j in range(1, cols)) for i in range(2, rows))

Tested with openpyxl==2.6.1 only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.