Skip to content

Instantly share code, notes, and snippets.

@mcaskill
Last active January 4, 2022 10:38
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcaskill/db368cc7971763fa3ec40f0da252f608 to your computer and use it in GitHub Desktop.
Save mcaskill/db368cc7971763fa3ec40f0da252f608 to your computer and use it in GitHub Desktop.
PHP : Strip HTML and PHP tags from a string.

strip_html

(PHP 5 >= 5.4)
strip_html — Strip HTML and PHP tags from a string.

Description

string strip_html( string $str )

This function tries to return a string with all NULL bytes, HTML and PHP tags stripped from a given str. Unlike strip_tags(), this function is mindful of line-breaks and metadata elements (e.g., <style>).

This function is useful for converting a HTML-rich string into a plain-text string for emails.

Parameters

  • str — The input string.

Return Values

Returns the stripped string.

Installation

With Composer

$ composer require mcaskill/php-strip-html

Without Composer

Why are you not using composer? Download Function.Strip-HTML.php from the gist and save the file into your project path somewhere.

{
"name": "mcaskill/php-strip-html",
"description": "Strip HTML and PHP tags from a string.",
"license": "MIT",
"authors": [
{
"name": "Chauncey McAskill",
"email": "chauncey@mcaskill.ca",
"homepage": "https://github.com/mcaskill"
}
],
"keywords": [
"function"
],
"extra": {
"branch-alias": {
"dev-master": "1.x-dev"
}
},
"require": {
"php": ">=5.4.0"
},
"autoload": {
"files": ["Function.Strip-HTML.php"]
}
}
<?php
if (!function_exists('strip_html')) {
/**
* Strip HTML and PHP tags from a string.
*
* @param string $str The input string.
* @return string Returns the stripped string.
*/
function strip_html($str)
{
$str = html_entity_decode($str);
// Strip HTML
$str = preg_replace('#<br[^>]*?>#siu', "\n", $str);
$str = preg_replace(
[
'#<head[^>]*?>.*?</head>#siu',
'#<style[^>]*?>.*?</style>#siu',
'#<script[^>]*?.*?</script>#siu',
'#<object[^>]*?.*?</object>#siu',
'#<embed[^>]*?.*?</embed>#siu',
'#<applet[^>]*?.*?</applet>#siu',
'#<noframes[^>]*?.*?</noframes>#siu',
'#<noscript[^>]*?.*?</noscript>#siu',
'#<noembed[^>]*?.*?</noembed>#siu'
],
'',
$str
);
$str = strip_tags($str);
// Trim whitespace
$str = str_replace("\t", '', $str);
$str = preg_replace('#\n\r|\r\n#', "\n", $str);
$str = preg_replace('#\n{3,}#', "\n\n", $str);
$str = trim($str);
return $str;
}
}
@phpSoftware
Copy link

phpSoftware commented Jan 4, 2022

Very nice piece of code. For me, this was an improvement! What do you think?

  #$string = str_replace("\t", '', $string);
  #$string = preg_replace('#\n\r|\r\n#', PHP_EOL, $string);
  #$string = preg_replace('#\n{3,}#', PHP_EOL.PHP_EOL, $string);
  $string = preg_replace("#(\s*[\r\n]\s*)+#", PHP_EOL, $string); // remove tabs, double space and linebreak to one LB
  $string = trim($string, ' '.chr(194).chr(160));                // trim all types of spaces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment