Last active
March 6, 2024 18:28
-
-
Save hugowetterberg/81747 to your computer and use it in GitHub Desktop.
A useful function for splitting ical content into 75-octet lines, taking multibyte characters into account. See: http://www.ietf.org/rfc/rfc2445.txt, section 4.1
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
mb_internal_encoding("UTF-8"); | |
$desc = <<<TEXT | |
<p>Lines of text SHOULD NOT be longer than 75 octets, (och hör på den) excluding the line break. Long content lines SHOULD be split into a multiple line representations using a line "folding" technique.</p> | |
That is, a long line can be split between any two characters by inserting a CRLF | |
immediately followed by a single linear white space character (i.e., | |
SPACE, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence | |
of CRLF followed immediately by a single linear white space character | |
is ignored (i.e., removed) when processing the content type. | |
TEXT; | |
/** | |
* Apply folding compliant with RFC 5545 | |
* See https://www.rfc-editor.org/rfc/rfc5545#section-3.1 | |
* | |
* @param string $preamble The property name, e.g. DESCRIPTION | |
* @param string $value The value for the property, e.g. a very long string | |
* @param bool $strip_tags Strip HTML tags from the value | |
* | |
* @return string Returns the folded string without the property name | |
*/ | |
function ical_split($preamble, $value, $strip_tags=true) | |
{ | |
$value = trim($value); | |
$value = preg_replace('/[\r\n]+/', ' ', $value); | |
$value = preg_replace('/\s{2,}/', ' ', $value); | |
if ($strip_tags) { | |
$value = strip_tags($value); | |
} | |
$value = $preamble . ':' . $value; | |
$offset = 0; | |
$chunkSize = 75; | |
$lines = []; | |
while ($line = mb_strcut($value, $offset, $chunkSize - 1)) { | |
$lines[] = $line; | |
$offset += $chunkSize; | |
} | |
return substr(join("\r\n\t", $lines), strlen($preamble) + 1); | |
} | |
$split = ical_split('DESCRIPTION:', $desc); | |
print 'DESCRIPTION:' . $split; | |
// Test results | |
$lines = preg_split('/\r\n/', 'DESCRIPTION:' . $split); | |
print "\n\nTests\n"; | |
foreach ($lines as $i => $line) { | |
print "Line {$i}: " . strlen($line) . " octets\n"; | |
} | |
print "\nAlt desc output:\n"; | |
$split = ical_split('X-ALT-DESC:', $desc, false); | |
print 'X-ALT-DESC:' . $split; | |
print "\n\n"; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DESCRIPTION:Lines of text SHOULD NOT be longer than 75 octets, (och hör | |
å den) excluding the line break. Long content lines SHOULD be split into | |
multiple line representations using a line "folding" technique. That is, | |
long line can be split between any two characters by inserting a CRLF imm | |
diately followed by a single linear white space character (i.e., SPACE, US | |
ASCII decimal 32 or HTAB, US-ASCII decimal 9). Any sequence of CRLF follow | |
d immediately by a single linear white space character is ignored (i.e., r | |
moved) when processing the content type. | |
Tests | |
Line 0: 73 octets | |
Line 1: 75 octets | |
Line 2: 75 octets | |
Line 3: 75 octets | |
Line 4: 75 octets | |
Line 5: 75 octets | |
Line 6: 75 octets | |
Line 7: 41 octets | |
Alt desc output: | |
X-ALT-DESC:<p>Lines of text SHOULD NOT be longer than 75 octets, (och hö | |
på den) excluding the line break. Long content lines SHOULD be split int | |
a multiple line representations using a line "folding" technique.</p> Tha | |
is, a long line can be split between any two characters by inserting a CR | |
F immediately followed by a single linear white space character (i.e., SPA | |
E, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence o | |
CRLF followed immediately by a single linear white space character is ign | |
red (i.e., removed) when processing the content type. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Huh, 14 years... time flies :)
Your implementation looks nice and elegant @viavario. Stripping out tags should probably have been separate from the folding, but I added an optional param to your implementation that can be used to disable tag stripping, preserving the old behaviour.