Skip to content

Instantly share code, notes, and snippets.

@bradland
Created August 16, 2015 19:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bradland/858b3b23830fd233be80 to your computer and use it in GitHub Desktop.
Save bradland/858b3b23830fd233be80 to your computer and use it in GitHub Desktop.

Description and reproduction

Running the script axlsx_print_area.rb (provided at end of issue) will produce an Excel workbook that, when unzipped, contains the workbook.xml contents; also provided below. It can be observed that the name attributes of the <definedName> nodes have been altered from their original form. This has a couple of undesirable side-effects.

  1. It breaks behavior specified in the examples (repeating headers). Repeating headers are not actually produced.

  2. Tangentially, it prevents one from using any of the built-in names specified in the source of Axlsx::DefinedName.

  3. It mangles valid defined names, leading to confusion. For example, if the defined name "Example_Range" is specified, it will be converted to "ExampleRange".

This mutation occurs within DefinedName#to_xml_string where Axlsx::SerializedAttributes#serialized_attributes is called on the name. The call to serialized_attfibutes was added in 25ebea9, but before that, it appears that it used Axlsx::camel, which would still exhibit the side effects outlined above.

Observations on defined names

As noted in the documentation, "A compliant producer or consumer considers a defined name in the range A1-XFD1048576 to be an error." The serialized_attributes method does nothing to prevent us from using 'A1' as a defined name. Additionally, some Google searching produced a reasonably good explanation of valid range names:

Range names cannot include spaces and although they can include numbers, they mustn’t start with a number. A range name cannot be a cell reference. This might sound obvious, but because of the dramatic increase in the number of columns in Excel 2007/10 it’s quite easy to accidentally try and enter a cell reference – for example in Excel 2007 Sal1 or Pur1 are both cell references, though they would be perfectly valid range names in Excel 2003.

Based on the history of DefinedName#to_xml_string, it looks like this method used to only camel case the XML attribute names, not the values. Unfortunately, serialized_attributes camel cases both, with no option to preserve values.

Possible solution

Minimal: Revert to not treating defined names as SerializedAttributes

Would the maintainers be open to a pull request that refactored DefinedName#to_xml_string to use a method more specifically suited to validating named range parameters? Even a reversion to the non-refactored to_xml_string method would eliminate the mangling of values.

Maximal: More refactoring of SerializedAttributes

Refactor goals

If a more complete solution is desired, then I think the goal should be implementing a serialization strategy within SerializedAttributes. The default strategy would be the current SerializedAttributes#serialized_attributes method. Additional strategies could be exposed through a different method call, or an optional argument to SerializedAttributes::ClassMethods#serializable_attributes.

A strategy for DefinedName should allow maximum latitude in naming ranges while raising errors if invalid range names are specified:

  • Raise ArgumentError if name is in the range A1-XFD1048576.
  • Raise ArgumentError if name starts with a number.
  • Raise ArgumentError if name contains spaces.

Refactor considerations

The main concern I see with such a change is that existing applications using Axlsx may be naively passing invalid range names to methods like Workbook#add_defined_name. The current implementation accomodates this by mutating the name to something that will work most of the time; although it doesn't prevent all invalid range names. Dropping the current implementation in favor of strict validation could cause some users headaches.

If it is determined that Workbook#add_defined_name must maintain some ability to "clean" names passed to it, a better implementation should still be possible. Rather than raise ArgumentError when spaces are found, they could be simply replaced with underscores.

Refactor details

Achieving these goals will require changes to Axlsx::SerializedAttributes. Currently, serialized_attributes treats all serializable attributes the same, applying camel case and boolean value formatting in all cases. Defined names (and possibly other attributes) need a different ruleset. I would propose that serialized_attributes accept an argument specifying the treatment of values.

Related issues

Repeated header on every page printed: #353

Resources

Extracting contents of Excel files

The easiest way to see what has been written to an Excel file is to extract the contents and inspect the contents of ./xl/workbook.xml. Because xlsx files are simply Zip archives, you can use something like the following to extract:

    mkdir -p unpack && unzip -d unpack axlsx_print_area.xlsx

It is important to specify a directory, or unzip will barf files/folders all over your cwd.

Scripts

axlsx_print_area.rb - Generates an Excel file with broken defined names.

    # Generates an Excel file with broken defined names.
    require 'axlsx'

    p = Axlsx::Package.new
    wb = p.workbook

    # Build up a matrix of data
    cols = ("A".."L").to_a
    rows = (1..100).map do |i|
      cols.zip(cols.dup.fill(i.to_s)).map(&:join)
    end

    wb.add_worksheet(:name => "Basic Worksheet") do |sheet|
      rows.each do |row|
        sheet.add_row row
      end
      wb.add_defined_name("'Basic Worksheet'!$A$1:$L$51", :local_sheet_id => sheet.index, :name => '_xlnm.Print_Area')
      wb.add_defined_name("'Basic Worksheet'!$1:$1", :local_sheet_id => sheet.index, :name => '_xlnm.Print_Titles')
      wb.add_defined_name("'Basic Worksheet'!$A$1:$B$1", :local_sheet_id => sheet.index, :name => 'Example_Range')
      wb.add_defined_name("'Basic Worksheet'!$D$1:$F$1", :local_sheet_id => sheet.index, :name => 'A1')
    end

    p.serialize "#{File.basename(__FILE__, '.rb')}.xlsx"

workbook.xml - From contents of extracted Excel file.

    <?xml version="1.0" encoding="UTF-8"?>
    <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
      <workbookPr date1904="false"/>
      <sheets>
        <sheet name="Basic Worksheet" r:id="rId4" sheetId="1"/>
      </sheets>
      <definedNames>
        <definedName localSheetId="0" name="Xlnm.PrintArea">'Basic Worksheet'!$A$1:$L$51</definedName>
        <definedName localSheetId="0" name="Xlnm.PrintTitles">'Basic Worksheet'!$1:$1</definedName>
        <definedName localSheetId="0" name="ExampleRange">'Basic Worksheet'!$A$1:$B$1</definedName>
      </definedNames>
    </workbook>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment