Skip to content

Instantly share code, notes, and snippets.

@ssokolow
Created August 13, 2010 06:09
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ssokolow/522402 to your computer and use it in GitHub Desktop.
Save ssokolow/522402 to your computer and use it in GitHub Desktop.
Wordpress WXR to Jekyll exporter

Having decided to move my blog to Jekyll, I went looking for solutions. AzizLight's class proved the simplest, but fell short of what I wanted.

This version has been modified to be runnable as a command-line script (help available via -h) and, aside from category permalink continuity (which I'm working on), produces a ready-to-use _posts directory for your Jekyll blog.

Full details on the changes are available in the git commit log.

I'm not sure about AzizLight's original code, but my additions are available under the same terms or the MIT license, whichever is more permissive.

However, there is a minor dependency on the WordPress wpautop() function, so delete the wp_formatting.php file if "GPL 2.0" without the "or later" clause bothers you.

Example Usage:

git clone git://gist.github.com/522402.git gist-522402
cd yourname.github.com
php ../gist-522402/WordpressExporter.php ../wordpress.xml

If it exits without saying anything, make sure which php points to php-cli and not php-cgi.

To enable permalink preservation, pass your old WordPress permalink string in via the -P argument. All keywords except %author% and %tag% are supported but I'm not 100% certain I implemented %category% correctly. (Jekyll only parses keywords like :category in _site.yml, so I can't just let Jekyll figure it out)

Support for %author% and %tag% aren't high on my personal TODO list since I don't have a need for them proportionate to the hassle but, if you need them, tell me and I'll add them to my TODO list.

AzizLight's Original Readme:

This afternoon, I wanted to export my wordpress posts in files (one file per post) to test Jekyll and GitHub Pages. I searched the web a bit and I was very disappointed: I didn't find any tool or script to do what I wanted to do. Jekyll provides some migration script that connects to the wordpress database, but it seemed overly complicated.

So I took the time to create a little php script myself. Unlike the Jekyll script, it uses the Wordpress eXtended RSS XML file (or WXR for WP-Nerds) that gets generated when you export your wordpress blog from within the admin interface.

It's not perfect at all, but it works, and it's easily customizable — I tested it on the "official" wordpress WXR sample posts file and an export of my own blog.

Here is a sample usage:

// NOTE: This is VERY important because it will affect the date/time of each post.
date_default_timezone_set( 'Europe/Paris' );

// Example usage:
// This will export the posts from the wordpress.xml file
// that is in the current folder and generate multiple post files in
// the posts folder that is also in the current folder.
$xml = new WordpressExporter( 'wordpress.xml' );
$xml->export( 'posts' );

This script consists of one class only. The class as it is just export posts to files, but with a little customization, it's possible to change the template of the filename of the post files that will be generated, the extension of the post files or even the contents of the post files. In other words, with minimal efforts, this class could be used to transform Wordpress posts to Jekyll-compatible post files.

I took the time to document the class as well as I could, so the code is self-explanatory as long as you know how to code PHP. However I will describe briefly how to create filename template and content templates:

To create a new filename template, you need to create a new private method with the name ftpl_{some_name} (ie: ftpl_jekyl). Then in the implementation, you need to set the filename template to use:

$xml = new WordpressExporter( 'wordpress.xml' );

// 1st method:
$xml->filenameTemplate = 'jekyll';

// 2nd method
$xml->setFilenameTemplate( 'jekyll' );

// 3rd method
$xml->export( 'posts', 'jekyll' );

Creating content templates is pretty much the same except that the name of the template method needs to be prefixed by tpl_ instead of ftpl_, the property's name is template not filenameTemplate and the setter is called setTemplate not setFilenameTemplate; also you could pass the template name as the third parameter of the export method.

Finally, to set the extension, user $xml->extension or $xml->setExtension; The dot before the extension is optional.

I am sure that this class can be improved a lot, if you think of anything, don't hesitate to leave a comment, or to fork the gist.

<?php
/**
* Spyc -- A Simple PHP YAML Class
* @version 0.4.5
* @author Vlad Andersen <vlad.andersen@gmail.com>
* @author Chris Wanstrath <chris@ozmm.org>
* @link http://code.google.com/p/spyc/
* @copyright Copyright 2005-2006 Chris Wanstrath, 2006-2009 Vlad Andersen
* @license http://www.opensource.org/licenses/mit-license.php MIT License
* @package Spyc
*/
if (!function_exists('spyc_load')) {
/**
* Parses YAML to array.
* @param string $string YAML string.
* @return array
*/
function spyc_load ($string) {
return Spyc::YAMLLoadString($string);
}
}
if (!function_exists('spyc_load_file')) {
/**
* Parses YAML to array.
* @param string $file Path to YAML file.
* @return array
*/
function spyc_load_file ($file) {
return Spyc::YAMLLoad($file);
}
}
/**
* The Simple PHP YAML Class.
*
* This class can be used to read a YAML file and convert its contents
* into a PHP array. It currently supports a very limited subsection of
* the YAML spec.
*
* Usage:
* <code>
* $Spyc = new Spyc;
* $array = $Spyc->load($file);
* </code>
* or:
* <code>
* $array = Spyc::YAMLLoad($file);
* </code>
* or:
* <code>
* $array = spyc_load_file($file);
* </code>
* @package Spyc
*/
class Spyc {
// SETTINGS
/**
* Setting this to true will force YAMLDump to enclose any string value in
* quotes. False by default.
*
* @var bool
*/
public $setting_dump_force_quotes = false;
/**
* Setting this to true will forse YAMLLoad to use syck_load function when
* possible. False by default.
* @var bool
*/
public $setting_use_syck_is_possible = false;
/**#@+
* @access private
* @var mixed
*/
private $_dumpIndent;
private $_dumpWordWrap;
private $_containsGroupAnchor = false;
private $_containsGroupAlias = false;
private $path;
private $result;
private $LiteralPlaceHolder = '___YAML_Literal_Block___';
private $SavedGroups = array();
private $indent;
/**
* Path modifier that should be applied after adding current element.
* @var array
*/
private $delayedPath = array();
/**#@+
* @access public
* @var mixed
*/
public $_nodeId;
/**
* Load a valid YAML string to Spyc.
* @param string $input
* @return array
*/
public function load ($input) {
return $this->__loadString($input);
}
/**
* Load a valid YAML file to Spyc.
* @param string $file
* @return array
*/
public function loadFile ($file) {
return $this->__load($file);
}
/**
* Load YAML into a PHP array statically
*
* The load method, when supplied with a YAML stream (string or file),
* will do its best to convert YAML in a file into a PHP array. Pretty
* simple.
* Usage:
* <code>
* $array = Spyc::YAMLLoad('lucky.yaml');
* print_r($array);
* </code>
* @access public
* @return array
* @param string $input Path of YAML file or string containing YAML
*/
public static function YAMLLoad($input) {
$Spyc = new Spyc;
return $Spyc->__load($input);
}
/**
* Load a string of YAML into a PHP array statically
*
* The load method, when supplied with a YAML string, will do its best
* to convert YAML in a string into a PHP array. Pretty simple.
*
* Note: use this function if you don't want files from the file system
* loaded and processed as YAML. This is of interest to people concerned
* about security whose input is from a string.
*
* Usage:
* <code>
* $array = Spyc::YAMLLoadString("---\n0: hello world\n");
* print_r($array);
* </code>
* @access public
* @return array
* @param string $input String containing YAML
*/
public static function YAMLLoadString($input) {
$Spyc = new Spyc;
return $Spyc->__loadString($input);
}
/**
* Dump YAML from PHP array statically
*
* The dump method, when supplied with an array, will do its best
* to convert the array into friendly YAML. Pretty simple. Feel free to
* save the returned string as nothing.yaml and pass it around.
*
* Oh, and you can decide how big the indent is and what the wordwrap
* for folding is. Pretty cool -- just pass in 'false' for either if
* you want to use the default.
*
* Indent's default is 2 spaces, wordwrap's default is 40 characters. And
* you can turn off wordwrap by passing in 0.
*
* @access public
* @return string
* @param array $array PHP array
* @param int $indent Pass in false to use the default, which is 2
* @param int $wordwrap Pass in 0 for no wordwrap, false for default (40)
*/
public static function YAMLDump($array,$indent = false,$wordwrap = false) {
$spyc = new Spyc;
return $spyc->dump($array,$indent,$wordwrap);
}
/**
* Dump PHP array to YAML
*
* The dump method, when supplied with an array, will do its best
* to convert the array into friendly YAML. Pretty simple. Feel free to
* save the returned string as tasteful.yaml and pass it around.
*
* Oh, and you can decide how big the indent is and what the wordwrap
* for folding is. Pretty cool -- just pass in 'false' for either if
* you want to use the default.
*
* Indent's default is 2 spaces, wordwrap's default is 40 characters. And
* you can turn off wordwrap by passing in 0.
*
* @access public
* @return string
* @param array $array PHP array
* @param int $indent Pass in false to use the default, which is 2
* @param int $wordwrap Pass in 0 for no wordwrap, false for default (40)
*/
public function dump($array,$indent = false,$wordwrap = false) {
// Dumps to some very clean YAML. We'll have to add some more features
// and options soon. And better support for folding.
// New features and options.
if ($indent === false or !is_numeric($indent)) {
$this->_dumpIndent = 2;
} else {
$this->_dumpIndent = $indent;
}
if ($wordwrap === false or !is_numeric($wordwrap)) {
$this->_dumpWordWrap = 40;
} else {
$this->_dumpWordWrap = $wordwrap;
}
// New YAML document
$string = "---\n";
// Start at the base of the array and move through it.
if ($array) {
$array = (array)$array;
$first_key = key($array);
$previous_key = -1;
foreach ($array as $key => $value) {
$string .= $this->_yamlize($key,$value,0,$previous_key, $first_key);
$previous_key = $key;
}
}
return $string;
}
/**
* Attempts to convert a key / value array item to YAML
* @access private
* @return string
* @param $key The name of the key
* @param $value The value of the item
* @param $indent The indent of the current node
*/
private function _yamlize($key,$value,$indent, $previous_key = -1, $first_key = 0) {
if (is_array($value)) {
if (empty ($value))
return $this->_dumpNode($key, array(), $indent, $previous_key, $first_key);
// It has children. What to do?
// Make it the right kind of item
$string = $this->_dumpNode($key, NULL, $indent, $previous_key, $first_key);
// Add the indent
$indent += $this->_dumpIndent;
// Yamlize the array
$string .= $this->_yamlizeArray($value,$indent);
} elseif (!is_array($value)) {
// It doesn't have children. Yip.
$string = $this->_dumpNode($key, $value, $indent, $previous_key, $first_key);
}
return $string;
}
/**
* Attempts to convert an array to YAML
* @access private
* @return string
* @param $array The array you want to convert
* @param $indent The indent of the current level
*/
private function _yamlizeArray($array,$indent) {
if (is_array($array)) {
$string = '';
$previous_key = -1;
$first_key = key($array);
foreach ($array as $key => $value) {
$string .= $this->_yamlize($key, $value, $indent, $previous_key, $first_key);
$previous_key = $key;
}
return $string;
} else {
return false;
}
}
/**
* Returns YAML from a key and a value
* @access private
* @return string
* @param $key The name of the key
* @param $value The value of the item
* @param $indent The indent of the current node
*/
private function _dumpNode($key, $value, $indent, $previous_key = -1, $first_key = 0) {
// do some folding here, for blocks
if (is_string ($value) && ((strpos($value,"\n") !== false || strpos($value,": ") !== false || strpos($value,"- ") !== false ||
strpos($value,"*") !== false || strpos($value,"#") !== false || strpos($value,"<") !== false || strpos($value,">") !== false ||
strpos($value,"[") !== false || strpos($value,"]") !== false || strpos($value,"{") !== false || strpos($value,"}") !== false) || substr ($value, -1, 1) == ':')) {
$value = $this->_doLiteralBlock($value,$indent);
} else {
$value = $this->_doFolding($value,$indent);
if (is_bool($value)) {
$value = ($value) ? "true" : "false";
}
}
if ($value === array()) $value = '[ ]';
$spaces = str_repeat(' ',$indent);
if (is_int($key) && $key - 1 == $previous_key && $first_key===0) {
// It's a sequence
$string = $spaces.'- '.$value."\n";
} else {
if ($first_key===0) throw new Exception('Keys are all screwy. The first one was zero, now it\'s "'. $key .'"');
// It's mapped
if (strpos($key, ":") !== false) { $key = '"' . $key . '"'; }
$string = $spaces.$key.': '.$value."\n";
}
return $string;
}
/**
* Creates a literal block for dumping
* @access private
* @return string
* @param $value
* @param $indent int The value of the indent
*/
private function _doLiteralBlock($value,$indent) {
if (strpos($value, "\n") === false && strpos($value, "'") === false) {
return sprintf ("'%s'", $value);
}
if (strpos($value, "\n") === false && strpos($value, '"') === false) {
return sprintf ('"%s"', $value);
}
$exploded = explode("\n",$value);
$newValue = '|';
$indent += $this->_dumpIndent;
$spaces = str_repeat(' ',$indent);
foreach ($exploded as $line) {
$newValue .= "\n" . $spaces . trim($line);
}
return $newValue;
}
/**
* Folds a string of text, if necessary
* @access private
* @return string
* @param $value The string you wish to fold
*/
private function _doFolding($value,$indent) {
// Don't do anything if wordwrap is set to 0
if ($this->_dumpWordWrap !== 0 && is_string ($value) && strlen($value) > $this->_dumpWordWrap) {
$indent += $this->_dumpIndent;
$indent = str_repeat(' ',$indent);
$wrapped = wordwrap($value,$this->_dumpWordWrap,"\n$indent");
$value = ">\n".$indent.$wrapped;
} else {
if ($this->setting_dump_force_quotes && is_string ($value))
$value = '"' . $value . '"';
}
return $value;
}
// LOADING FUNCTIONS
private function __load($input) {
$Source = $this->loadFromSource($input);
return $this->loadWithSource($Source);
}
private function __loadString($input) {
$Source = $this->loadFromString($input);
return $this->loadWithSource($Source);
}
private function loadWithSource($Source) {
if (empty ($Source)) return array();
if ($this->setting_use_syck_is_possible && function_exists ('syck_load')) {
$array = syck_load (implode ('', $Source));
return is_array($array) ? $array : array();
}
$this->path = array();
$this->result = array();
$cnt = count($Source);
for ($i = 0; $i < $cnt; $i++) {
$line = $Source[$i];
$this->indent = strlen($line) - strlen(ltrim($line));
$tempPath = $this->getParentPathByIndent($this->indent);
$line = self::stripIndent($line, $this->indent);
if (self::isComment($line)) continue;
if (self::isEmpty($line)) continue;
$this->path = $tempPath;
$literalBlockStyle = self::startsLiteralBlock($line);
if ($literalBlockStyle) {
$line = rtrim ($line, $literalBlockStyle . " \n");
$literalBlock = '';
$line .= $this->LiteralPlaceHolder;
while (++$i < $cnt && $this->literalBlockContinues($Source[$i], $this->indent)) {
$literalBlock = $this->addLiteralLine($literalBlock, $Source[$i], $literalBlockStyle);
}
$i--;
}
while (++$i < $cnt && self::greedilyNeedNextLine($line)) {
$line = rtrim ($line, " \n\t\r") . ' ' . ltrim ($Source[$i], " \t");
}
$i--;
if (strpos ($line, '#')) {
if (strpos ($line, '"') === false && strpos ($line, "'") === false)
$line = preg_replace('/\s+#(.+)$/','',$line);
}
$lineArray = $this->_parseLine($line);
if ($literalBlockStyle)
$lineArray = $this->revertLiteralPlaceHolder ($lineArray, $literalBlock);
$this->addArray($lineArray, $this->indent);
foreach ($this->delayedPath as $indent => $delayedPath)
$this->path[$indent] = $delayedPath;
$this->delayedPath = array();
}
return $this->result;
}
private function loadFromSource ($input) {
if (!empty($input) && strpos($input, "\n") === false && file_exists($input))
return file($input);
return $this->loadFromString($input);
}
private function loadFromString ($input) {
$lines = explode("\n",$input);
foreach ($lines as $k => $_) {
$lines[$k] = rtrim ($_, "\r");
}
return $lines;
}
/**
* Parses YAML code and returns an array for a node
* @access private
* @return array
* @param string $line A line from the YAML file
*/
private function _parseLine($line) {
if (!$line) return array();
$line = trim($line);
if (!$line) return array();
$array = array();
$group = $this->nodeContainsGroup($line);
if ($group) {
$this->addGroup($line, $group);
$line = $this->stripGroup ($line, $group);
}
if ($this->startsMappedSequence($line))
return $this->returnMappedSequence($line);
if ($this->startsMappedValue($line))
return $this->returnMappedValue($line);
if ($this->isArrayElement($line))
return $this->returnArrayElement($line);
if ($this->isPlainArray($line))
return $this->returnPlainArray($line);
return $this->returnKeyValuePair($line);
}
/**
* Finds the type of the passed value, returns the value as the new type.
* @access private
* @param string $value
* @return mixed
*/
private function _toType($value) {
if ($value === '') return null;
$first_character = $value[0];
$last_character = substr($value, -1, 1);
$is_quoted = false;
do {
if (!$value) break;
if ($first_character != '"' && $first_character != "'") break;
if ($last_character != '"' && $last_character != "'") break;
$is_quoted = true;
} while (0);
if ($is_quoted)
return strtr(substr ($value, 1, -1), array ('\\"' => '"', '\'\'' => '\'', '\\\'' => '\''));
if (strpos($value, ' #') !== false)
$value = preg_replace('/\s+#(.+)$/','',$value);
if ($first_character == '[' && $last_character == ']') {
// Take out strings sequences and mappings
$innerValue = trim(substr ($value, 1, -1));
if ($innerValue === '') return array();
$explode = $this->_inlineEscape($innerValue);
// Propagate value array
$value = array();
foreach ($explode as $v) {
$value[] = $this->_toType($v);
}
return $value;
}
if (strpos($value,': ')!==false && $first_character != '{') {
$array = explode(': ',$value);
$key = trim($array[0]);
array_shift($array);
$value = trim(implode(': ',$array));
$value = $this->_toType($value);
return array($key => $value);
}
if ($first_character == '{' && $last_character == '}') {
$innerValue = trim(substr ($value, 1, -1));
if ($innerValue === '') return array();
// Inline Mapping
// Take out strings sequences and mappings
$explode = $this->_inlineEscape($innerValue);
// Propagate value array
$array = array();
foreach ($explode as $v) {
$SubArr = $this->_toType($v);
if (empty($SubArr)) continue;
if (is_array ($SubArr)) {
$array[key($SubArr)] = $SubArr[key($SubArr)]; continue;
}
$array[] = $SubArr;
}
return $array;
}
if ($value == 'null' || $value == 'NULL' || $value == 'Null' || $value == '' || $value == '~') {
return null;
}
if (intval($first_character) > 0 && preg_match ('/^[1-9]+[0-9]*$/', $value)) {
$intvalue = (int)$value;
if ($intvalue != PHP_INT_MAX)
$value = $intvalue;
return $value;
}
if (in_array($value,
array('true', 'on', '+', 'yes', 'y', 'True', 'TRUE', 'On', 'ON', 'YES', 'Yes', 'Y'))) {
return true;
}
if (in_array(strtolower($value),
array('false', 'off', '-', 'no', 'n'))) {
return false;
}
if (is_numeric($value)) {
if ($value === '0') return 0;
if (trim ($value, 0) === $value)
$value = (float)$value;
return $value;
}
return $value;
}
/**
* Used in inlines to check for more inlines or quoted strings
* @access private
* @return array
*/
private function _inlineEscape($inline) {
// There's gotta be a cleaner way to do this...
// While pure sequences seem to be nesting just fine,
// pure mappings and mappings with sequences inside can't go very
// deep. This needs to be fixed.
$seqs = array();
$maps = array();
$saved_strings = array();
// Check for strings
$regex = '/(?:(")|(?:\'))((?(1)[^"]+|[^\']+))(?(1)"|\')/';
if (preg_match_all($regex,$inline,$strings)) {
$saved_strings = $strings[0];
$inline = preg_replace($regex,'YAMLString',$inline);
}
unset($regex);
$i = 0;
do {
// Check for sequences
while (preg_match('/\[([^{}\[\]]+)\]/U',$inline,$matchseqs)) {
$seqs[] = $matchseqs[0];
$inline = preg_replace('/\[([^{}\[\]]+)\]/U', ('YAMLSeq' . (count($seqs) - 1) . 's'), $inline, 1);
}
// Check for mappings
while (preg_match('/{([^\[\]{}]+)}/U',$inline,$matchmaps)) {
$maps[] = $matchmaps[0];
$inline = preg_replace('/{([^\[\]{}]+)}/U', ('YAMLMap' . (count($maps) - 1) . 's'), $inline, 1);
}
if ($i++ >= 10) break;
} while (strpos ($inline, '[') !== false || strpos ($inline, '{') !== false);
$explode = explode(', ',$inline);
$stringi = 0; $i = 0;
while (1) {
// Re-add the sequences
if (!empty($seqs)) {
foreach ($explode as $key => $value) {
if (strpos($value,'YAMLSeq') !== false) {
foreach ($seqs as $seqk => $seq) {
$explode[$key] = str_replace(('YAMLSeq'.$seqk.'s'),$seq,$value);
$value = $explode[$key];
}
}
}
}
// Re-add the mappings
if (!empty($maps)) {
foreach ($explode as $key => $value) {
if (strpos($value,'YAMLMap') !== false) {
foreach ($maps as $mapk => $map) {
$explode[$key] = str_replace(('YAMLMap'.$mapk.'s'), $map, $value);
$value = $explode[$key];
}
}
}
}
// Re-add the strings
if (!empty($saved_strings)) {
foreach ($explode as $key => $value) {
while (strpos($value,'YAMLString') !== false) {
$explode[$key] = preg_replace('/YAMLString/',$saved_strings[$stringi],$value, 1);
unset($saved_strings[$stringi]);
++$stringi;
$value = $explode[$key];
}
}
}
$finished = true;
foreach ($explode as $key => $value) {
if (strpos($value,'YAMLSeq') !== false) {
$finished = false; break;
}
if (strpos($value,'YAMLMap') !== false) {
$finished = false; break;
}
if (strpos($value,'YAMLString') !== false) {
$finished = false; break;
}
}
if ($finished) break;
$i++;
if ($i > 10)
break; // Prevent infinite loops.
}
return $explode;
}
private function literalBlockContinues ($line, $lineIndent) {
if (!trim($line)) return true;
if (strlen($line) - strlen(ltrim($line)) > $lineIndent) return true;
return false;
}
private function referenceContentsByAlias ($alias) {
do {
if (!isset($this->SavedGroups[$alias])) { echo "Bad group name: $alias."; break; }
$groupPath = $this->SavedGroups[$alias];
$value = $this->result;
foreach ($groupPath as $k) {
$value = $value[$k];
}
} while (false);
return $value;
}
private function addArrayInline ($array, $indent) {
$CommonGroupPath = $this->path;
if (empty ($array)) return false;
foreach ($array as $k => $_) {
$this->addArray(array($k => $_), $indent);
$this->path = $CommonGroupPath;
}
return true;
}
private function addArray ($incoming_data, $incoming_indent) {
// print_r ($incoming_data);
if (count ($incoming_data) > 1)
return $this->addArrayInline ($incoming_data, $incoming_indent);
$key = key ($incoming_data);
$value = isset($incoming_data[$key]) ? $incoming_data[$key] : null;
if ($key === '__!YAMLZero') $key = '0';
if ($incoming_indent == 0 && !$this->_containsGroupAlias && !$this->_containsGroupAnchor) { // Shortcut for root-level values.
if ($key || $key === '' || $key === '0') {
$this->result[$key] = $value;
} else {
$this->result[] = $value; end ($this->result); $key = key ($this->result);
}
$this->path[$incoming_indent] = $key;
return;
}
$history = array();
// Unfolding inner array tree.
$history[] = $_arr = $this->result;
foreach ($this->path as $k) {
$history[] = $_arr = $_arr[$k];
}
if ($this->_containsGroupAlias) {
$value = $this->referenceContentsByAlias($this->_containsGroupAlias);
$this->_containsGroupAlias = false;
}
// Adding string or numeric key to the innermost level or $this->arr.
if (is_string($key) && $key == '<<') {
if (!is_array ($_arr)) { $_arr = array (); }
$_arr = array_merge ($_arr, $value);
} else if ($key || $key === '' || $key === '0') {
$_arr[$key] = $value;
} else {
if (!is_array ($_arr)) { $_arr = array ($value); $key = 0; }
else { $_arr[] = $value; end ($_arr); $key = key ($_arr); }
}
$reverse_path = array_reverse($this->path);
$reverse_history = array_reverse ($history);
$reverse_history[0] = $_arr;
$cnt = count($reverse_history) - 1;
for ($i = 0; $i < $cnt; $i++) {
$reverse_history[$i+1][$reverse_path[$i]] = $reverse_history[$i];
}
$this->result = $reverse_history[$cnt];
$this->path[$incoming_indent] = $key;
if ($this->_containsGroupAnchor) {
$this->SavedGroups[$this->_containsGroupAnchor] = $this->path;
if (is_array ($value)) {
$k = key ($value);
if (!is_int ($k)) {
$this->SavedGroups[$this->_containsGroupAnchor][$incoming_indent + 2] = $k;
}
}
$this->_containsGroupAnchor = false;
}
}
private static function startsLiteralBlock ($line) {
$lastChar = substr (trim($line), -1);
if ($lastChar != '>' && $lastChar != '|') return false;
if ($lastChar == '|') return $lastChar;
// HTML tags should not be counted as literal blocks.
if (preg_match ('#<.*?>$#', $line)) return false;
return $lastChar;
}
private static function greedilyNeedNextLine($line) {
$line = trim ($line);
if (!strlen($line)) return false;
if (substr ($line, -1, 1) == ']') return false;
if ($line[0] == '[') return true;
if (preg_match ('#^[^:]+?:\s*\[#', $line)) return true;
return false;
}
private function addLiteralLine ($literalBlock, $line, $literalBlockStyle) {
$line = self::stripIndent($line);
$line = rtrim ($line, "\r\n\t ") . "\n";
if ($literalBlockStyle == '|') {
return $literalBlock . $line;
}
if (strlen($line) == 0)
return rtrim($literalBlock, ' ') . "\n";
if ($line == "\n" && $literalBlockStyle == '>') {
return rtrim ($literalBlock, " \t") . "\n";
}
if ($line != "\n")
$line = trim ($line, "\r\n ") . " ";
return $literalBlock . $line;
}
function revertLiteralPlaceHolder ($lineArray, $literalBlock) {
foreach ($lineArray as $k => $_) {
if (is_array($_))
$lineArray[$k] = $this->revertLiteralPlaceHolder ($_, $literalBlock);
else if (substr($_, -1 * strlen ($this->LiteralPlaceHolder)) == $this->LiteralPlaceHolder)
$lineArray[$k] = rtrim ($literalBlock, " \r\n");
}
return $lineArray;
}
private static function stripIndent ($line, $indent = -1) {
if ($indent == -1) $indent = strlen($line) - strlen(ltrim($line));
return substr ($line, $indent);
}
private function getParentPathByIndent ($indent) {
if ($indent == 0) return array();
$linePath = $this->path;
do {
end($linePath); $lastIndentInParentPath = key($linePath);
if ($indent <= $lastIndentInParentPath) array_pop ($linePath);
} while ($indent <= $lastIndentInParentPath);
return $linePath;
}
private function clearBiggerPathValues ($indent) {
if ($indent == 0) $this->path = array();
if (empty ($this->path)) return true;
foreach ($this->path as $k => $_) {
if ($k > $indent) unset ($this->path[$k]);
}
return true;
}
private static function isComment ($line) {
if (!$line) return false;
if ($line[0] == '#') return true;
if (trim($line, " \r\n\t") == '---') return true;
return false;
}
private static function isEmpty ($line) {
return (trim ($line) === '');
}
private function isArrayElement ($line) {
if (!$line) return false;
if ($line[0] != '-') return false;
if (strlen ($line) > 3)
if (substr($line,0,3) == '---') return false;
return true;
}
private function isHashElement ($line) {
return strpos($line, ':');
}
private function isLiteral ($line) {
if ($this->isArrayElement($line)) return false;
if ($this->isHashElement($line)) return false;
return true;
}
private static function unquote ($value) {
if (!$value) return $value;
if (!is_string($value)) return $value;
if ($value[0] == '\'') return trim ($value, '\'');
if ($value[0] == '"') return trim ($value, '"');
return $value;
}
private function startsMappedSequence ($line) {
return ($line[0] == '-' && substr ($line, -1, 1) == ':');
}
private function returnMappedSequence ($line) {
$array = array();
$key = self::unquote(trim(substr($line,1,-1)));
$array[$key] = array();
$this->delayedPath = array(strpos ($line, $key) + $this->indent => $key);
return array($array);
}
private function returnMappedValue ($line) {
$array = array();
$key = self::unquote (trim(substr($line,0,-1)));
$array[$key] = '';
return $array;
}
private function startsMappedValue ($line) {
return (substr ($line, -1, 1) == ':');
}
private function isPlainArray ($line) {
return ($line[0] == '[' && substr ($line, -1, 1) == ']');
}
private function returnPlainArray ($line) {
return $this->_toType($line);
}
private function returnKeyValuePair ($line) {
$array = array();
$key = '';
if (strpos ($line, ':')) {
// It's a key/value pair most likely
// If the key is in double quotes pull it out
if (($line[0] == '"' || $line[0] == "'") && preg_match('/^(["\'](.*)["\'](\s)*:)/',$line,$matches)) {
$value = trim(str_replace($matches[1],'',$line));
$key = $matches[2];
} else {
// Do some guesswork as to the key and the value
$explode = explode(':',$line);
$key = trim($explode[0]);
array_shift($explode);
$value = trim(implode(':',$explode));
}
// Set the type of the value. Int, string, etc
$value = $this->_toType($value);
if ($key === '0') $key = '__!YAMLZero';
$array[$key] = $value;
} else {
$array = array ($line);
}
return $array;
}
private function returnArrayElement ($line) {
if (strlen($line) <= 1) return array(array()); // Weird %)
$array = array();
$value = trim(substr($line,1));
$value = $this->_toType($value);
$array[] = $value;
return $array;
}
private function nodeContainsGroup ($line) {
$symbolsForReference = 'A-z0-9_\-';
if (strpos($line, '&') === false && strpos($line, '*') === false) return false; // Please die fast ;-)
if ($line[0] == '&' && preg_match('/^(&['.$symbolsForReference.']+)/', $line, $matches)) return $matches[1];
if ($line[0] == '*' && preg_match('/^(\*['.$symbolsForReference.']+)/', $line, $matches)) return $matches[1];
if (preg_match('/(&['.$symbolsForReference.']+)$/', $line, $matches)) return $matches[1];
if (preg_match('/(\*['.$symbolsForReference.']+$)/', $line, $matches)) return $matches[1];
if (preg_match ('#^\s*<<\s*:\s*(\*[^\s]+).*$#', $line, $matches)) return $matches[1];
return false;
}
private function addGroup ($line, $group) {
if ($group[0] == '&') $this->_containsGroupAnchor = substr ($group, 1);
if ($group[0] == '*') $this->_containsGroupAlias = substr ($group, 1);
//print_r ($this->path);
}
private function stripGroup ($line, $group) {
$line = trim(str_replace($group, '', $line));
return $line;
}
}
// Enable use of Spyc from command line
// The syntax is the following: php spyc.php spyc.yaml
define ('SPYC_FROM_COMMAND_LINE', false);
do {
if (!SPYC_FROM_COMMAND_LINE) break;
if (empty ($_SERVER['argc']) || $_SERVER['argc'] < 2) break;
if (empty ($_SERVER['PHP_SELF']) || $_SERVER['PHP_SELF'] != 'spyc.php') break;
$file = $argv[1];
printf ("Spyc loading file: %s\n", $file);
print_r (spyc_load_file ($file));
} while (0);
<?php
/**
* Wordpress WXR exporter with default Jekyll output
*
* @todo: Check what else WXR stores that Jekyll can use. (Tags? Comments?)
*/
$here = dirname(__FILE__);
// Proper YAML serializer. MIT-licensed.
require_once $here . "/spyc.php";
// WordPress's formatting.php. Used for wpautop().
// Licensed under the GPL 2.0 without the "or later" clause.
// DELETE WITHOUT READING IF YOU NEED ANOTHER LICENSE.
require_once $here . "/wp_formatting.php";
/**
* Wordpress to Files
* Export Wordpress posts and pages to files.
*/
class WordpressExporter
{
/**
* Destination folder; where the post files will be created.
*
* @access private
* @var string
*/
private $dest;
/**
* Contents of the xml file.
*
* @access private
* @var string
*/
private $xml;
/**
* Array containing all the posts.
*
* @access private
* @var array
*/
private $posts;
/**
* The name of template that will be used to format
* the contents of each post file.
*
* @access private
* @var string
*/
private $template;
/**
* The name of the template that will format the name of the post file.
*
* @access private
* @var string
*/
private $filenameTemplate;
/**
* The extension of the post files.
*
* @access private
* @var string
*/
private $extension;
/**
* Whether to export only posts.
*
* @access private
* @var bool
*/
private $postsOnly;
/**
* Optional permalink format to hard-code in exported posts.
*
* @access private
* @var string
*/
private $permalinkFormat;
// -> DEBUG <- //
private $exportSuccess;
// ----------------------------------------------------------------------------
/**
* El Constructor!
*
* @access public
* @author Aziz Light
*/
public function __construct( $wpxml = '', $dest = '' )
{
if ( ! $this->checkXml( $wpxml ) )
exit( 'Invalid Wordpress eXtended RSS file!!' );
// I want the second param to be optional here.
if ( $dest != '' )
$this->dest = $this->setDest( $dest );
$this->posts = array();
$this->template = $this->setTemplate( 'default' );
$this->filenameTemplate = $this->setFilenameTemplate( 'default' );
$this->extension = '.html';
$this->postsOnly = false;
$this->exportSuccess = false;
} // End of __construct
// ----------------------------------------------------------------------------
/**
* El Destructor!
*
* @access public
* @author Aziz Light
*/
public function __destruct()
{
if ( $this->exportSuccess === true )
echo "Export Done!\n";
} // End of __destruct
// ----------------------------------------------------------------------------
/**
* Setter.
* Mainly used to set the destination directory path and the template.
*/
public function __set( $property, $value )
{
$setter = 'set' . ucfirst( $property );
if ( ! method_exists( $this, $setter ) )
{
throw new Exception( "The property that you are trying to set doesn't exist!! ... or it's not accessible, move along." );
exit;
}
else
{
$this->$setter( $value );
}
} // End of public function __set
// ----------------------------------------------------------------------------
/**
* Getter.
* Mainly used to get the posts array in this class.
*/
public function __get( $property )
{
$getter = 'get' . ucfirst( $property );
if ( ! method_exists( $this, $getter ) )
{
throw new Exception( "The property that you are trying to access doesn't exist!! ... or it's not accessible, move along." );
exit;
}
else
{
return $this->$getter( $property );
}
} // End of public function __get
// ----------------------------------------------------------------------------
/**
* Export the Wordpress posts in multiple files.
*
* @access public
* @param string $dest : The destination folder where all the post files will be generated.
* @param string $filenameTemplate : The name of the template that will format the name of the post files.
* @param string $template : The name of the template that will format the content of the post files.
* @return bool
* @author Aziz Light
*/
public function export( $dest = '', $filenameTemplate = '', $template = '' )
{
if ( $dest != '' )
$this->dest = $this->setDest( $dest );
elseif ( $this->dest == '' )
{
throw new Exception( "Unable to find the destination directory!" );
exit;
}
if ( $filenameTemplate != '' )
$this->setFilenameTemplate( $filenameTemplate );
if ( $template != '' )
$this->setTemplate( $template );
if ( empty( $this->posts ) )
$this->extractPosts();
foreach ( $this->posts as $post )
{
if ( $post['type'] != 'post' && $post['type'] != 'page' )
continue;
if ( $this->postsOnly && $post['type'] != 'post' )
continue;
// Setup the name of the file.
$filename = $this->setupFilename( $post );
$filePath = $this->dest . $filename;
// Setup the content of the file.
$content = $this->setupContent( $post );
if ( $handle = @fopen( $filePath, 'wb' ) )
{
flock($handle, LOCK_EX);
fwrite($handle, $content);
flock($handle, LOCK_UN);
fclose($handle);
}
else
{
throw new Exception( "Unable to create a new file. Verify that the destination folder is writable!" );
exit;
}
unset( $content, $filePath, $filename );
}
$this->exportSuccess = true; // DEBUG <-
return true;
} // End of public function export
// ----------------------------------------------------------------------------
/**
* Extract the posts from the xml file and return the array of posts.
* If the posts have been extracted previously, this method will just
* return the array of posts.
*
* @access public
* @return array : The array of posts.
* @author Aziz Light
*/
public function extractPosts()
{
if ( empty( $this->posts ) )
{
$xml = @new SimpleXmlElement( $this->xml );
$namespaces = $xml->getDocNamespaces();
foreach ( $xml->channel->item as $x )
{
$post = array();
$post['id' ] = (string) $x->children( $namespaces['wp'] )->post_id;
$post['title' ] = (string) $x->title;
$post['slug' ] = (string) $x->children( $namespaces['wp'] )->post_name;
$post['date' ] = (string) $x->pubDate;
$post['timestamp'] = strtotime( $x->pubDate );
$post['author' ] = (string) $x->children( $namespaces['dc'] )->creator;
$post['content' ] = (string) $x->children( $namespaces['content'] )->encoded;
$post['type' ] = (string) $x->children( $namespaces['wp'] )->post_type;
$post['status' ] = (string) $x->children( $namespaces['wp'] )->status;
$categories = array();
$category_slugs = array();
foreach( $x->category as $cat ) {
$cat_slug = $cat->attributes()->nicename;
if (trim($cat_slug) && $cat->attributes()->domain == 'category') {
$categories[] = (string) $cat;
$category_slugs[] = $cat->attributes()->nicename;
}
}
$categories = array_unique($categories);
$category_slugs = array_unique($category_slugs);
// Spyc doesn't like "screwy" keys
$post['categories'] = array_merge($categories);
$post['category_slugs'] = array_merge($category_slugs);
$this->posts[] = $post;
unset( $post );
}
}
return $this->posts;
} // End of public function export
// ----------------------------------------------------------------------------
/**
* Get the extracted posts.
* NOTE:
* This method doesn't extract the posts,
* so it will return an empty array if the
* posts were not extracted from the xml file
* previously using the extractPosts method.
*
* @access public
* @return array : The array of posts.
* @author Aziz Light
*/
public function getPosts()
{
return $this->posts;
} // End of public function getPosts
// ----------------------------------------------------------------------------
/**
* Set the destination folder.
*
* @access public
* @return bool
* @author Aziz Light
**/
public function setDest( $dest = '' )
{
//foc(substr( $dest, -1 ));// DEBUG <-
// Add a trailing slash if there isn't one.
// NOTE: To Windows users: You will probably get an error here...
if ( substr( $dest, -1 ) != '/' )
$dest .= '/';
//foc($dest); // DEBUG <-
if ( $dest != '' && is_dir( $dest ) )
{
$this->dest = $dest;
return $this->dest;
}
else
{
throw new Exception( "Unable to find the destination directory!" );
exit;
}
} // End of function setDest
// ----------------------------------------------------------------------------
/**
* Check that the provided xml exists and is readable.
*
* @access private
* @param string $xml : The path to the xml file.
* @return bool
* @author Aziz Light
*/
private function checkXml( $xml )
{
// NOTE:
// If I put explode in the end function directly
// I will get a "Strict Standards" error saying
// that only variables should be passed by reference.
$s = explode( '.', $xml );
$extension = end( $s );
unset( $s );
if ( $extension != 'xml' )
{
throw new Exception( "You passed a .{$extension} file. You need to pass a WXR file (.xml) !!" );
exit;
}
elseif ( !is_readable( $xml ) )
{
throw new Exception( "Unable to find or open the xml file" );
exit;
}
else
{
$this->xml = file_get_contents( $xml );
return true;
}
} // End of private function checkXml
// ----------------------------------------------------------------------------
/**
* Make a string "URI-safe".
*
* @access private
* @param string $string : The String.
* @return string : The new String!
* @author Aziz Light
*/
private function slugify( $string )
{
return preg_replace('/[\s_\.-]+/', '-', strtolower(trim($string, " \t\n\r\0\x0B-_.")));
} // End of private function slugify
// ----------------------------------------------------------------------------
/**
* Formats the content of the post file using the specified template.
*
* @access private
* @param array $post : The post array.
* @return string : The formatted content of the post file.
* @author Aziz Light
*/
private function setupContent( $post )
{
$tpl = $this->template;
return $this->$tpl( $post );
} // End of public function setupContent
// ----------------------------------------------------------------------------
/**
* Formats the name of the post files using the specified template.
*
* @access private
* @param array $post : The post array.
* @return string : The formatted filename.
* @author Aziz Light
*/
private function setupFilename( $post )
{
$ftpl = $this->filenameTemplate;
return $this->$ftpl( $post );
} // End of private function setupFilename
// ----------------------------------------------------------------------------
/**
* Set the template that will format the name of the post files.
*
* @access public
* @param string $template : The name of the template.
* @return string : The name of the template method
* @author Aziz Light
*/
public function setFilenameTemplate( $template = '' )
{
$templateMethodName = 'ftpl_' . $template;
if ( $template != '' && method_exists( $this, $templateMethodName ) )
{
$this->filenameTemplate = $templateMethodName;
return $templateMethodName;
}
else
{
throw new Exception( "The specified filename template doen't exist!!" );
exit;
}
} // End of public function setFilenameTemplate
// ----------------------------------------------------------------------------
/**
* Set the template that will format the content of the post files.
*
* @access public
* @param string $template : The name of the template.
* @return string : The name of the template method.
* @author Aziz Light
*/
public function setTemplate( $template = '' )
{
$templateMethodName = 'tpl_' . $template;
if ( $template != '' && method_exists( $this, $templateMethodName ) )
{
$this->template = $templateMethodName;
return $templateMethodName;
}
else
{
throw new Exception( "The specified template doesn't exist!!" );
exit;
}
} // End of public function setTemplate
// ----------------------------------------------------------------------------
/**
* Set the extenion of the post files that will be generated.
* Also prepends a dot (.) to the extension if there isn't one.
*
* @access public
* @param string $extension : The extension.
* @return string: The extension.
* @author Aziz Light
*/
public function setExtension( $extension )
{
if ( $extenion[0] != '.' )
$extenion = '.' . $extenion;
$this->extension = strtolower( $extension );
return $extension;
} // End of public function setExtension
// ----------------------------------------------------------------------------
/**
* Set the postsOnly flag.
* Coerces type to bool.
*
* @access public
* @param string $flag : The flag.
* @return string: The flag as a boolean.
* @author Stephan Sokolow
*/
public function setPostsOnly( $flag )
{
$flag = (bool)$flag;
$this->postsOnly = $flag;
return $flag;
} // End of public function setPostsOnly
// ----------------------------------------------------------------------------
/**
* Set the permalinkFormat string.
*
* @access public
* @param string $pattern : A WordPress permalink string.
* @return string: The string.
* @author Stephan Sokolow
*/
public function setPermalinkFormat( $pattern )
{
$this->permalinkFormat = $pattern;
return $pattern;
} // End of public function setPostsOnly
// ----------------------------------------------------------------------------
/**
* Callback for WordPress permalink string conversion.
*
* @access protected
* @param string $pattern : A WordPress permalink string.
* @return string: The string.
* @author Stephan Sokolow
*/
protected function permalinkConvertCallback( $match )
{
global $temp_post;
$supported = array(
'year' => date( 'Y', $temp_post['timestamp'] ),
'monthnum' => date( 'm', $temp_post['timestamp'] ),
'day' => date( 'd', $temp_post['timestamp'] ),
'hour' => date( 'H', $temp_post['timestamp'] ),
'minute' => date( 'i', $temp_post['timestamp'] ),
'second' => date( 's', $temp_post['timestamp'] ),
'postname' => $temp_post['slug'],
'category' => $temp_post['category_slugs'][0],
'post_id' => $temp_post['id'],
// TODO: Still to sub: tag, author
);
if (array_key_exists($match[1], $supported)) {
return $supported[$match[1]];
} else {
return $match[0];
}
} // End of protected function setPostsOnly
// ----------------------------------------------------------------------------
/**
* Default post file template.
*
* @access private
* @param string $post : The post array.
* @return string : The formatted post file content.
* @author Aziz Light
*/
private function tpl_default( $post )
{
//$type = $post['type'];
$content = $post['content'];
// DELETE IF YOU NEED NON-GPL2 (eg. MIT or GPL3) LICENSING
$content = convert_chars($content);
$content = wpautop($content);
// END GPL2-dependent code
$front_matter = array(
'layout' => 'post',
'title' => $post['title'],
'author' => $post['author'],
'published' => $post['status'] == 'publish',
'date' => date( DATE_ISO8601, $post['timestamp'] ),
'categories' => $post['category_slugs'],
);
if ($this->permalinkFormat && $front_matter['published']) {
global $temp_post;
$temp_post = $post;
$permalink = preg_replace_callback(
'/%(year|monthnum|day|postname|category|post_id|tag|author|hour|minute|second)%/',
array($this, 'permalinkConvertCallback'),
$this->permalinkFormat);
if (substr($permalink, -1, 1) == '/') {
$permalink = $permalink . 'index.html';
}
$front_matter['permalink'] = $permalink;
}
// Note: Spyc wordwrap must be disabled (the 0) for permalinks to work
$front_matter = Spyc::YAMLDump($front_matter, false, 0);
return <<<TPL
$front_matter
---
$content
TPL;
} // End of private function tpl_default
// ----------------------------------------------------------------------------
/**
* Default filename template.
*
* @access private
* @param string $post : The post array.
* @return string : The formatted filename.
* @author Aziz Light
*/
private function ftpl_default( $post )
{
$filename = date( 'Y-m-d-', $post['timestamp'] );
$filename .= ( ! empty( $post['slug'] ) ) ? $post['slug'] : $this->slugify( $post['title'] );
$filename .= $this->extension;
return $filename;
} // End of private function ftpl_default
} // End of class WordpressExporter
// Support for simply calling this from the command-line as a converter script
// (Note: Will fail if someone symlinked php-cgi rather than building php-cli)
if(php_sapi_name() == "cli" && !debug_backtrace()) {
$options = "ho:pP:t";
// Extract (don't just copy) options from $argv
$prog = $argv[0];
$opts = getopt( $options );
$args = $argv;
foreach( $opts as $o => $a )
{
while( $k = array_search( "-" . $o, $args ) )
{
if( $k )
unset( $args[$k] );
if( preg_match( "/^.*".$o.":.*$/i", $options ) )
unset( $args[$k+1] );
}
}
unset( $args[0] );
$args = array_merge( $args );
// Display help if input is invalid
if (isset($opts['h']) || !$args) {
echo "Usage: $prog [options] <WordPress XML File> ...\n";
echo "\n";
echo " -h\tDisplay this help\n";
echo " -o\tSet the output path (Defaults to ./_posts)\n";
echo " -p\tOnly export posts (not pages)\n";
echo " -P\tSpecify a permalink string (any mixture of WordPress and Jekyll keywords)\n";
echo " \tto be converted to Jekyll-format and hard-coded into each published posed\n";
echo " \tto prevent broken links.";
echo " -t\tSet a different timezone for the exported entries\n";
exit();
}
// Set the output path
if (isset($opts['o'])) {
$outpath = $opts['o'];
} else {
$outpath = '_posts';
}
// Set the timezone
if (isset($opts['t'])) {
date_default_timezone_set($opts['t']);
}
// Create the target directory if it doesn't exist
if (!is_dir($outpath) && !mkdir($outpath, 0777, true)) {
die("Unable to create target directory");
}
// Support multiple input files (Newer posts overwrite older with filename)
foreach ( $args as $infile ) {
$xml = new WordpressExporter( $infile, $outpath );
if (isset($opts['p'])) {
$xml->postsOnly = true;
}
if (isset($opts['P'])) {
$xml->permalinkFormat = $opts['P'];
}
$xml->export();
}
}
<?php
/**
* WARNING: THIS CODE IS LICENSED "GPL 2.0" WITHOUT THE "or later" CLAUSE.
*
* IF YOU NEED TO LICENSE YOUR CODE UNDER MIT/BSD, LGPL, or GPL 3.0 LICENSES,
* STOP READING NOW!
*
* Source: http://trac.wordpress.org/browser/trunk/wp-includes/formatting.php
**/
/**
* Accepts matches array from preg_replace_callback in wpautop() or a string.
*
* Ensures that the contents of a <<pre>>...<</pre>> HTML block are not
* converted into paragraphs or line-breaks.
*
* @since 1.2.0
*
* @param array|string $matches The array or string
* @return string The pre block without paragraph/line-break conversion.
*/
function clean_pre($matches) {
if ( is_array($matches) )
$text = $matches[1] . $matches[2] . "</pre>";
else
$text = $matches;
$text = str_replace('<br />', '', $text);
$text = str_replace('<p>', "\n", $text);
$text = str_replace('</p>', '', $text);
return $text;
}
/**
* Replaces double line-breaks with paragraph elements.
*
* A group of regex replaces used to identify text formatted with newlines and
* replace double line-breaks with HTML paragraph tags. The remaining
* line-breaks after conversion become <<br />> tags, unless $br is set to '0'
* or 'false'.
*
* @since 0.71
*
* @param string $pee The text which has to be formatted.
* @param int|bool $br Optional. If set, this will convert all remaining line-breaks after paragraphing. Default true.
* @return string Text which has been converted into correct paragraph tags.
*/
function wpautop($pee, $br = 1) {
if ( trim($pee) === '' )
return '';
$pee = $pee . "\n"; // just to make things a little easier, pad the end
$pee = preg_replace('|<br />\s*<br />|', "\n\n", $pee);
// Space things out a little
$allblocks = '(?:table|thead|tfoot|caption|col|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|select|option|form|map|area|blockquote|address|math|style|input|p|h[1-6]|hr|fieldset|legend|section|article|aside|hgroup|header|footer|nav|figure|figcaption|details|menu|summary)';
$pee = preg_replace('!(<' . $allblocks . '[^>]*>)!', "\n$1", $pee);
$pee = preg_replace('!(</' . $allblocks . '>)!', "$1\n\n", $pee);
$pee = str_replace(array("\r\n", "\r"), "\n", $pee); // cross-platform newlines
if ( strpos($pee, '<object') !== false ) {
$pee = preg_replace('|\s*<param([^>]*)>\s*|', "<param$1>", $pee); // no pee inside object/embed
$pee = preg_replace('|\s*</embed>\s*|', '</embed>', $pee);
}
$pee = preg_replace("/\n\n+/", "\n\n", $pee); // take care of duplicates
// make paragraphs, including one at the end
$pees = preg_split('/\n\s*\n/', $pee, -1, PREG_SPLIT_NO_EMPTY);
$pee = '';
foreach ( $pees as $tinkle )
$pee .= '<p>' . trim($tinkle, "\n") . "</p>\n";
$pee = preg_replace('|<p>\s*</p>|', '', $pee); // under certain strange conditions it could create a P of entirely whitespace
$pee = preg_replace('!<p>([^<]+)</(div|address|form)>!', "<p>$1</p></$2>", $pee);
$pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee); // don't pee all over a tag
$pee = preg_replace("|<p>(<li.+?)</p>|", "$1", $pee); // problem with nested lists
$pee = preg_replace('|<p><blockquote([^>]*)>|i', "<blockquote$1><p>", $pee);
$pee = str_replace('</blockquote></p>', '</p></blockquote>', $pee);
$pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)!', "$1", $pee);
$pee = preg_replace('!(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee);
if ($br) {
$pee = preg_replace_callback('/<(script|style).*?<\/\\1>/s', create_function('$matches', 'return str_replace("\n", "<WPPreserveNewline />", $matches[0]);'), $pee);
$pee = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee); // optionally make line breaks
$pee = str_replace('<WPPreserveNewline />', "\n", $pee);
}
$pee = preg_replace('!(</?' . $allblocks . '[^>]*>)\s*<br />!', "$1", $pee);
$pee = preg_replace('!<br />(\s*</?(?:p|li|div|dl|dd|dt|th|pre|td|ul|ol)[^>]*>)!', '$1', $pee);
if (strpos($pee, '<pre') !== false)
$pee = preg_replace_callback('!(<pre[^>]*>)(.*?)</pre>!is', 'clean_pre', $pee );
$pee = preg_replace( "|\n</p>$|", '</p>', $pee );
return $pee;
}
/**
* Converts a number of characters from a string.
*
* Metadata tags <<title>> and <<category>> are removed, <<br>> and <<hr>> are
* converted into correct XHTML and Unicode characters are converted to the
* valid range.
*
* @since 0.71
*
* @param string $content String of characters to be converted.
* @param string $deprecated Not used.
* @return string Converted string.
*/
function convert_chars($content, $deprecated = '') {
if ( !empty( $deprecated ) )
_deprecated_argument( __FUNCTION__, '0.71' );
// Translation of invalid Unicode references range to valid range
$wp_htmltranswinuni = array(
'&#128;' => '&#8364;', // the Euro sign
'&#129;' => '',
'&#130;' => '&#8218;', // these are Windows CP1252 specific characters
'&#131;' => '&#402;', // they would look weird on non-Windows browsers
'&#132;' => '&#8222;',
'&#133;' => '&#8230;',
'&#134;' => '&#8224;',
'&#135;' => '&#8225;',
'&#136;' => '&#710;',
'&#137;' => '&#8240;',
'&#138;' => '&#352;',
'&#139;' => '&#8249;',
'&#140;' => '&#338;',
'&#141;' => '',
'&#142;' => '&#382;',
'&#143;' => '',
'&#144;' => '',
'&#145;' => '&#8216;',
'&#146;' => '&#8217;',
'&#147;' => '&#8220;',
'&#148;' => '&#8221;',
'&#149;' => '&#8226;',
'&#150;' => '&#8211;',
'&#151;' => '&#8212;',
'&#152;' => '&#732;',
'&#153;' => '&#8482;',
'&#154;' => '&#353;',
'&#155;' => '&#8250;',
'&#156;' => '&#339;',
'&#157;' => '',
'&#158;' => '',
'&#159;' => '&#376;'
);
// Remove metadata tags
$content = preg_replace('/<title>(.+?)<\/title>/','',$content);
$content = preg_replace('/<category>(.+?)<\/category>/','',$content);
// Converts lone & characters into &#38; (a.k.a. &amp;)
$content = preg_replace('/&([^#])(?![a-z1-4]{1,8};)/i', '&#038;$1', $content);
// Fix Word pasting
$content = strtr($content, $wp_htmltranswinuni);
// Just a little XHTML help
$content = str_replace('<br>', '<br />', $content);
$content = str_replace('<hr>', '<hr />', $content);
return $content;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment