Skip to content

Instantly share code, notes, and snippets.

@mdellavo
Created March 3, 2011 20:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mdellavo/853413 to your computer and use it in GitHub Desktop.
Save mdellavo/853413 to your computer and use it in GitHub Desktop.
try:
import openpyxl
def XLSXDictReader(f):
book = openpyxl.reader.excel.load_workbook(f)
sheet = book.get_active_sheet()
rows = sheet.get_highest_row()
cols = sheet.get_highest_column()
headers = dict( (i, sheet.cell(row=0, column=i).value) \
for i in range(cols) )
def item(i, j):
return (sheet.cell(row=0, column=j).value,
sheet.cell(row=i, column=j).value)
return (dict(item(i,j) for j in range(cols)) for i in range(1, rows))
except ImportError:
XLSXDictReader = None
@erinkeith
Copy link

erinkeith commented Mar 22, 2017

@eldang, openpyxl does have attributes for the work sheet to get you the max_row and max_columns:

rows = sheet.max_row
cols = sheet.max_column

@adejones
Copy link

adejones commented Jan 8, 2018

This is what works for me on openpyxl 2.4.8 using sheet.max_row and sheet.max_column

def XLSXDictReader(f):
    book = openpyxl.reader.excel.load_workbook(f)
    sheet = book.get_active_sheet()
    rows = sheet.max_row
    cols = sheet.max_column
    headers = dict((i, sheet.cell(row=1, column=i).value) for i in range(1, cols))
    def item(i, j):
        return (sheet.cell(row=1, column=j).value, sheet.cell(row=i, column=j).value)
    return (dict(item(i, j) for j in range(1, cols + 1)) for i in range(2, rows + 1))

Thanks for doing the hard work!

@eldang
Copy link

eldang commented Mar 8, 2019

@erinkeith @adejones thank you both for continuing to improve on this! (and @erinkeith I'm sorry I didn't see the notification when you commented).

I'm glad to take a few lines of code out with the switch to .max_row and .max_column. The current openpyxl documentation says that get_active_sheet() is deprecated and should be replaced with just .active, which is working for me. I appreciate the simplicity & conciseness of @adejones's version, but for the files I work with I often run into problems because of empty cells being returned as None rather than the empty strings that Python's built-in CSV support treats them as. And I sometimes need to be able to specify which worksheet to use, so here's my latest version in case it helps anyone:

def XLSXDictReader(fileName, sheetName=None):
  book = openpyxl.reader.excel.load_workbook(fileName)
  # if there's no sheet name specified, try to get the active sheet.  This will work reliably for workbooks with only one sheet; unpredictably if there are multiple worksheets present.
  if sheetName is None:
    sheet = book.active
  elif sheetName not in book.sheetnames:
    print(sheetName, "not found in", fileName)
    exit()
  else:
    sheet = book[sheetName]

  rows = sheet.max_row + 1
  cols = sheet.max_column + 1

  def cleanValue(s):
    if s == None:
      return ''
    else:
      return str(s).strip()

  def item(i, j):
    return (
      cleanValue(sheet.cell(row=1, column=j).value),
      cleanValue(sheet.cell(row=i, column=j).value)
    )

  return (dict(item(i,j) for j in range(1, cols)) for i in range(2, rows))

Tested with openpyxl==2.6.1 only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment