Skip to content

Instantly share code, notes, and snippets.

@Tricertops
Created April 10, 2013 16:12
Show Gist options
  • Save Tricertops/5356053 to your computer and use it in GitHub Desktop.
Save Tricertops/5356053 to your computer and use it in GitHub Desktop.
Convert HTML string to plain text by deleting HTML tags and replacing escaped sequences.
- (NSString *)stringByDeletingHTML {
// Delete HTMl tags.
/// http://stackoverflow.com/questions/277055/remove-html-tags-from-an-nsstring-on-the-iphone
NSRange range;
NSMutableString *string = [self mutableCopy];
while ((range = [string rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
[string deleteCharactersInRange:range];
// Replace escaped sequences.
NSDictionary *escapes = @{
@"&quot;": @"\"",
@"&apos;": @"",
@"&lt;" : @"",
@"&gt;" : @"",
@"&amp;" : @"&", // Should be last.
};
for (NSString *toFind in escapes) {
NSString *toReplace = [escapes objectForKey:toFind];
[string replaceOccurrencesOfString:toFind withString:toReplace options:0 range:NSMakeRange(0, string.length)];
}
// Replace &#0000; by corresponding Unicode character.
while ((range = [string rangeOfString:@"&#[0-9]+;" options:NSRegularExpressionSearch]).location != NSNotFound) {
NSString *unicodeNumber = [string substringWithRange:NSMakeRange(range.location+2, range.length-3)];
NSString *replacement = [NSString stringWithFormat:@"%C", (unichar)unicodeNumber.intValue];
[string replaceCharactersInRange:range withString:replacement];
}
return string;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment