Skip to content

Instantly share code, notes, and snippets.

View fonini's full-sized avatar
🏠

Jonnas Fonini fonini

🏠
View GitHub Profile
@dave1010
dave1010 / strip_word_html.php
Created November 12, 2010 13:14
Strip MS Word HTML. From php.net
<?php
function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')
{
mb_regex_encoding('UTF-8');
//replace MS special characters first
$search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');
$replace = array('\'', '\'', '"', '"', '-');
$text = preg_replace($search, $replace, $text);
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
//in some MS headers, some html entities are encoded and some aren't