Running the script axlsx_print_area.rb
(provided at end of issue) will produce an Excel workbook that, when unzipped, contains the workbook.xml
contents; also provided below. It can be observed that the name attributes of the <definedName>
nodes have been altered from their original form. This has a couple of undesirable side-effects.
-
It breaks behavior specified in the examples (repeating headers). Repeating headers are not actually produced.
-
Tangentially, it prevents one from using any of the built-in names specified in the source of
Axlsx::DefinedName
. -
It mangles valid defined names, leading to confusion. For example, if the defined name "Example_Range" is specified, it will be converted to "ExampleRange".
This mutation occurs within DefinedName#to_xml_string
where Axlsx::SerializedAttributes#serialized_attributes
is called on the name. The call to serialized_attfibutes
was added in 25ebea9, but before that, it appears that it used Axlsx::camel
, which would still exhibit the side effects outlined above.
As noted in the documentation, "A compliant producer or consumer considers a defined name in the range A1-XFD1048576 to be an error." The serialized_attributes
method does nothing to prevent us from using 'A1' as a defined name. Additionally, some Google searching produced a reasonably good explanation of valid range names:
Range names cannot include spaces and although they can include numbers, they mustn’t start with a number. A range name cannot be a cell reference. This might sound obvious, but because of the dramatic increase in the number of columns in Excel 2007/10 it’s quite easy to accidentally try and enter a cell reference – for example in Excel 2007 Sal1 or Pur1 are both cell references, though they would be perfectly valid range names in Excel 2003.
Based on the history of DefinedName#to_xml_string
, it looks like this method used to only camel case the XML attribute names, not the values. Unfortunately, serialized_attributes
camel cases both, with no option to preserve values.
Would the maintainers be open to a pull request that refactored DefinedName#to_xml_string
to use a method more specifically suited to validating named range parameters? Even a reversion to the non-refactored to_xml_string
method would eliminate the mangling of values.
If a more complete solution is desired, then I think the goal should be implementing a serialization strategy within SerializedAttributes. The default strategy would be the current SerializedAttributes#serialized_attributes
method. Additional strategies could be exposed through a different method call, or an optional argument to SerializedAttributes::ClassMethods#serializable_attributes
.
A strategy for DefinedName
should allow maximum latitude in naming ranges while raising errors if invalid range names are specified:
- Raise
ArgumentError
ifname
is in the range A1-XFD1048576. - Raise
ArgumentError
ifname
starts with a number. - Raise
ArgumentError
ifname
contains spaces.
The main concern I see with such a change is that existing applications using Axlsx may be naively passing invalid range names to methods like Workbook#add_defined_name
. The current implementation accomodates this by mutating the name to something that will work most of the time; although it doesn't prevent all invalid range names. Dropping the current implementation in favor of strict validation could cause some users headaches.
If it is determined that Workbook#add_defined_name
must maintain some ability to "clean" names passed to it, a better implementation should still be possible. Rather than raise ArgumentError
when spaces are found, they could be simply replaced with underscores.
Achieving these goals will require changes to Axlsx::SerializedAttributes
. Currently, serialized_attributes
treats all serializable attributes the same, applying camel case and boolean value formatting in all cases. Defined names (and possibly other attributes) need a different ruleset. I would propose that serialized_attributes
accept an argument specifying the treatment of values.
Repeated header on every page printed: #353
The easiest way to see what has been written to an Excel file is to extract the contents and inspect the contents of ./xl/workbook.xml
. Because xlsx files are simply Zip archives, you can use something like the following to extract:
mkdir -p unpack && unzip -d unpack axlsx_print_area.xlsx
It is important to specify a directory, or unzip will barf files/folders all over your cwd.
axlsx_print_area.rb
- Generates an Excel file with broken defined names.
# Generates an Excel file with broken defined names.
require 'axlsx'
p = Axlsx::Package.new
wb = p.workbook
# Build up a matrix of data
cols = ("A".."L").to_a
rows = (1..100).map do |i|
cols.zip(cols.dup.fill(i.to_s)).map(&:join)
end
wb.add_worksheet(:name => "Basic Worksheet") do |sheet|
rows.each do |row|
sheet.add_row row
end
wb.add_defined_name("'Basic Worksheet'!$A$1:$L$51", :local_sheet_id => sheet.index, :name => '_xlnm.Print_Area')
wb.add_defined_name("'Basic Worksheet'!$1:$1", :local_sheet_id => sheet.index, :name => '_xlnm.Print_Titles')
wb.add_defined_name("'Basic Worksheet'!$A$1:$B$1", :local_sheet_id => sheet.index, :name => 'Example_Range')
wb.add_defined_name("'Basic Worksheet'!$D$1:$F$1", :local_sheet_id => sheet.index, :name => 'A1')
end
p.serialize "#{File.basename(__FILE__, '.rb')}.xlsx"
workbook.xml
- From contents of extracted Excel file.
<?xml version="1.0" encoding="UTF-8"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<workbookPr date1904="false"/>
<sheets>
<sheet name="Basic Worksheet" r:id="rId4" sheetId="1"/>
</sheets>
<definedNames>
<definedName localSheetId="0" name="Xlnm.PrintArea">'Basic Worksheet'!$A$1:$L$51</definedName>
<definedName localSheetId="0" name="Xlnm.PrintTitles">'Basic Worksheet'!$1:$1</definedName>
<definedName localSheetId="0" name="ExampleRange">'Basic Worksheet'!$A$1:$B$1</definedName>
</definedNames>
</workbook>