In default configuration of Spreadsheet::ParseXLSX, whenever we call Spreadsheet::ParseXLSX->new()->parse('user_input_file.xlsx')
, we'd be vulnerable for XXE vulnerability if the XLSX file that we are parsing is from user input.
Download the following files:
- PoC for loading
/etc/passwd
: https://poc.drstra.in/parsexlsx-0c7977b1d40b/read.xlsx - PoC for triggering Denial of Service: https://poc.drstra.in/parsexlsx-0c7977b1d40b/dos.xlsx
Run the below code with perl: perl vulnerable_server.pm
(remember to place the vulnerable_server.pm
in the same directory with read.xlsx
).
# vulnerable_server.pm
use Spreadsheet::ParseXLSX;
my $parser = Spreadsheet::ParseXLSX->new();
my $workbook = $parser->parse('read.xlsx');
# my $workbook = $parser->parse('dos.xlsx');
my $worksheet = $workbook->worksheet(0);
my ($row_min, $row_max) = $worksheet->row_range;
my ($col_min, $col_max) = $worksheet->col_range;
my $data = [];
for my $r ($row_min ... $row_max) {
my $cell = $worksheet->get_cell($r, $col_min);
my $length = length $cell;
if ($length != 0) {
push @$data, $cell->value;
}
}
foreach (@$data) {
print "$_\n";
}
After running the code, you should see that it loads to /etc/passwd
into our $data
variable. This behavior is troublesome for servers that load xlsx file from users, and then the server tries to store those data into their database. This would cause the /etc/passwd
to be stored into the database and potentially be read by the user if there is some kind of feature to read database.
1: calling $parser->parse('read.xlsx')
; trigger the following code: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L66
2: eventually, it will hit the function $self->_parse_workbook
at: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L107
3: within $self->_parse_workbook
, it will eventually leads to $self->_parse_sheet
function: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L188
4: $self->_parse_sheet
eventually leads to $self->_new_twig
: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L229
5: $self->_new_twig
will call an outer library XML::Twig->new
: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L1126
We can read the documentation of XML::Twig at: https://metacpan.org/pod/XML::Twig#no_xxe , you should see that it supports an option called no_xxe that should eliminate the XXE vulnerability if we toggle this option. However, by default the Spreadsheet::ParseXLSX library forgot to use this option, which causes it to have XXE vulnerability in default configuration.
- We could completely disable the XXE feature of XML::Twig by always set
no_xxe
to true - Or we could create an option to allow users to enable XXE or not, in security perspective the default must be
no_xxe = true
, whoever try to toggle the XXE option is responsible for their behavior if there are any security incident.
CVE-2024-23525 was assigned for the XXE vuln