Skip to content

Instantly share code, notes, and snippets.

@towony
Forked from searbe/parse_xlsx.php
Last active July 2, 2022 09:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save towony/bad21f95fced05f75fa16ddebf8d1dd6 to your computer and use it in GitHub Desktop.
Save towony/bad21f95fced05f75fa16ddebf8d1dd6 to your computer and use it in GitHub Desktop.
Parse simple XLSX in PHP with SimpleXML and ZipArchive
<?php
/**
* I had to parse an XLSX spreadsheet (which should damn well have been a CSV!)
* but the usual tools were hitting the memory limit pretty quick. I found that
* manually parsing the XML worked pretty well. Note that this, most likely,
* won't work if cells contain anything more than text or a number (so formulas,
* graphs, etc ..., I don't know what'd happen).
*/
function parse_xlsx ($inputFile, $mode=0){
for($i='A'; $i<'ZZ'; $i++) $AtoZZ[] = $i; $rows = array_flip($AtoZZ); $data = array(); // add1
//$inputFile = '/path/to/spreadsheet.xlsx';
$dir = '../../data/tmp/'.md5(rand());
// Unzip
$zip = new ZipArchive();
$zip->open($inputFile);
$zip->extractTo($dir);
// Open up shared strings & the first worksheet
$strings = simplexml_load_file($dir . '/xl/sharedStrings.xml');
$sheet = simplexml_load_file($dir . '/xl/worksheets/sheet1.xml');
// Parse the rows
$xlrows = $sheet->sheetData->row;
if (!$data) for($i=1; $i<$xlrows['r']; $i++) $data[] = array(''); // add0 : 비어있는 행만큼 배열에 추가한다.
foreach ($xlrows as $xlrow) {
$arr = array();
// In each row, grab it's value
foreach ($xlrow->c as $cell) {
$row = preg_replace("/[^A-Z]*/s", "", $cell['r']); // add1
$v = (string) $cell->v;
// If it has a "t" (type?) of "s" (string?), use the value to look up string value
if (isset($cell['t']) && $cell['t'] == 's') {
$s = array();
$si = $strings->si[(int) $v];
// Register & alias the default namespace or you'll get empty results in the xpath query
$si->registerXPathNamespace('n', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main');
// Cat together all of the 't' (text?) node values
foreach($si->xpath('.//n:t') as $t) {
$s[] = (string) $t;
}
$v = implode($s);
}
$arr[$rows[$row]] = $v; // mod1
}
// Assuming the first row are headers, stick them in the headers array
if ($mode==1) $data[] = $arr; // add2 - mode 1 : usually
else { // mode 0 : title and value => array_combine
if (count($headers) == 0) $headers = $arr;
else {
foreach($headers as $i=>$v) if ($arr[$i]==null) $arr[$i]=''; ksort($arr); // add3
// Combine the row with the headers - make sure we have the same column count
// $values = array_pad($arr, count($headers), '');
$data[] = array_combine($headers, $arr); // mod
/**
* Here, do whatever you like with the [header => value] assoc array in $row.
* It might be useful just to run this script without any code here, to watch
* memory usage simply iterating over your spreadsheet.
*/
}
}
}
@unlink($dir);
@unlink($inputFile);
return $data; // add4
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment