Skip to content

Instantly share code, notes, and snippets.

@phvietan
Last active March 2, 2024 20:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save phvietan/d1c95a88ab6e17047b0248d6bf9eac4a to your computer and use it in GitHub Desktop.
Save phvietan/d1c95a88ab6e17047b0248d6bf9eac4a to your computer and use it in GitHub Desktop.
XXE in default configuration of Spreadsheet::ParseXLSX in doy/spreadsheet-parsexlsx

Description

In default configuration of Spreadsheet::ParseXLSX, whenever we call Spreadsheet::ParseXLSX->new()->parse('user_input_file.xlsx'), we'd be vulnerable for XXE vulnerability if the XLSX file that we are parsing is from user input.

Proof of Concept

Download the following files:

Run the below code with perl: perl vulnerable_server.pm (remember to place the vulnerable_server.pm in the same directory with read.xlsx).

# vulnerable_server.pm
use Spreadsheet::ParseXLSX;

my $parser   = Spreadsheet::ParseXLSX->new();
my $workbook = $parser->parse('read.xlsx');
# my $workbook = $parser->parse('dos.xlsx');

my $worksheet = $workbook->worksheet(0);
my ($row_min, $row_max) = $worksheet->row_range;
my ($col_min, $col_max) = $worksheet->col_range;

my $data = [];

for my $r ($row_min ... $row_max) {
    my $cell = $worksheet->get_cell($r, $col_min);
    my $length = length $cell;
    if ($length != 0) {
        push @$data, $cell->value;
    }
}

foreach (@$data) {
  print "$_\n";
}

After running the code, you should see that it loads to /etc/passwd into our $data variable. This behavior is troublesome for servers that load xlsx file from users, and then the server tries to store those data into their database. This would cause the /etc/passwd to be stored into the database and potentially be read by the user if there is some kind of feature to read database.

Bug details

1: calling $parser->parse('read.xlsx'); trigger the following code: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L66

2: eventually, it will hit the function $self->_parse_workbook at: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L107

3: within $self->_parse_workbook, it will eventually leads to $self->_parse_sheet function: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L188

4: $self->_parse_sheet eventually leads to $self->_new_twig: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L229

5: $self->_new_twig will call an outer library XML::Twig->new: https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L1126

We can read the documentation of XML::Twig at: https://metacpan.org/pod/XML::Twig#no_xxe , you should see that it supports an option called no_xxe that should eliminate the XXE vulnerability if we toggle this option. However, by default the Spreadsheet::ParseXLSX library forgot to use this option, which causes it to have XXE vulnerability in default configuration.

Potential fix

  • We could completely disable the XXE feature of XML::Twig by always set no_xxe to true
  • Or we could create an option to allow users to enable XXE or not, in security perspective the default must be no_xxe = true, whoever try to toggle the XXE option is responsible for their behavior if there are any security incident.
@MichaelDaum
Copy link

@stigtsp
Copy link

stigtsp commented Jan 18, 2024

CVE-2024-23525 was assigned for the XXE vuln

@MichaelDaum
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment