Skip to content

Instantly share code, notes, and snippets.

@Jamesking56
Last active April 19, 2021 12:05
Show Gist options
  • Save Jamesking56/4773838 to your computer and use it in GitHub Desktop.
Save Jamesking56/4773838 to your computer and use it in GitHub Desktop.
Read Wordpress Export XML to PHP
<?php
/**
* WordPress class - Manages the WordPress XML file and gets all data from that.
*/
class Wordpress
{
public static $wpXML;
function __construct($wpXML)
{
$this->wpXML = $wpXML;
}
public function getPosts()
{
$xml = simplexml_load_file($this->wpXML);
$posts = array();
foreach($xml->channel->item as $item)
{
$categories = array();
foreach($item->category as $category)
{
//echo $category['domain'];
if($category['nicename'] != "uncategorized" && $category['domain'] == "category")
{
//echo 'Yep';
$categories[] = $category['nicename'];
}
}
$content = $item->children('http://purl.org/rss/1.0/modules/content/');
$posts[] = array(
"title"=>$item->title,
"content"=>$content->encoded,
"pubDate"=>$item->pubDate,
"categories"=>implode(",", $categories),
"slug"=>str_replace("/", "", str_replace("http://blog.jamesking56.co.uk/", "", $item->guid))
);
}
return $posts;
}
}
?>
@dcblogdev
Copy link

Thanks for this, it's been extremely useful, I had header been executed from my xml file must have been from the content wrapping the content in htmlentities solved that.

htmlentities($content->encoded) 

I also added a excerpt:

$excerpt = $item->children('http://wordpress.org/export/1.2/excerpt/');

@luizventurote
Copy link

Thank you!

@tricoos
Copy link

tricoos commented Aug 15, 2018

Thank you! I've optimized it a bit, added tags and both tags and categories now return the display value as well. Also removed all SimpleXmlElements by type-casting to string:

    foreach ($xml->channel->item as $item) {
        $categories = array();
        $tags       = [];
        foreach ($item->category as $category) {
            if ($category['nicename'] != "uncategorized" && $category['domain'] == "category") {
                $categories[(string)$category['nicename']] = (string)$category;
            } elseif ($category['domain'] == 'post_tag') {
                $tags[(string)$category['nicename']] = (string)$category;
            }
        }

        $content = $item->children('http://purl.org/rss/1.0/modules/content/');

        $posts[] = array(
            "title"      => (string)$item->title,
            "content"    => (string)$content->encoded,
            "pubDate"    => new DateTime((string)$item->pubDate),
            "categories" => $categories,
            "tags"       => $tags,
            "slug"       => (string)$item->guid
        );
    }

@Jamesking56
Copy link
Author

@tricoos Wow I wrote this in 2013 I am surprised it still works!

@cristianfx
Copy link

Thank you for this. Yes, it's still working in 2019. Do you know how to parse the metadata from the XML as well? Thanks!

@Jamesking56
Copy link
Author

@cristianfx I am amazed this still works in 2019. Metadata could probably be parsed in a similar way.

@polosson
Copy link

Really cool function, thanks a lot!

@stuartcusackie
Copy link

Anybody else getting these errors?

simplexml_load_file(): I/O warning : failed to load external entity "<?xml version="1.0" encoding="UTF-8" ?>

@Jamesking56
Copy link
Author

Anybody else getting these errors?

simplexml_load_file(): I/O warning : failed to load external entity "<?xml version="1.0" encoding="UTF-8" ?>

See this StackOverflow answer: https://stackoverflow.com/a/21661663/941446

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment