Skip to content

Instantly share code, notes, and snippets.

@goldzulu
Created May 4, 2020 11:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save goldzulu/76004ec9f41151ad17fd10b68c7f06c2 to your computer and use it in GitHub Desktop.
Save goldzulu/76004ec9f41151ad17fd10b68c7f06c2 to your computer and use it in GitHub Desktop.
Convert HTML to Text (Javascript)
// CONVERT HTML TO TEXT
var returnText = htmlSource;
//-- remove BR tags and replace them with line break
returnText=returnText.replace(/<br>/gi, "\n");
returnText=returnText.replace(/<br\s\/>/gi, "\n");
returnText=returnText.replace(/<br\/>/gi, "\n");
//-- remove P and A tags but preserve what's inside of them
returnText=returnText.replace(/<p.*>/gi, "\n");
returnText=returnText.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 ($1)");
//-- remove all inside SCRIPT and STYLE tags
returnText=returnText.replace(/<script.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/script>/gi, "");
returnText=returnText.replace(/<style.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/style>/gi, "");
//-- remove all else
returnText=returnText.replace(/<(?:.|\s)*?>/g, "");
//-- get rid of more than 2 multiple line breaks:
returnText=returnText.replace(/(?:(?:\r\n|\r|\n)\s*){2,}/gim, "\n\n");
//-- get rid of more than 2 spaces:
returnText = returnText.replace(/ +(?= )/g,'');
//-- get rid of html-encoded characters:
returnText=returnText.replace(/&nbsp;/gi," ");
returnText=returnText.replace(/&amp;/gi,"&");
returnText=returnText.replace(/&quot;/gi,'"');
returnText=returnText.replace(/&lt;/gi,'<');
returnText=returnText.replace(/&gt;/gi,'>');
myVar = returnText;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment