Skip to content

Instantly share code, notes, and snippets.

@hugowetterberg
Last active March 6, 2024 18:28
  • Star 11 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save hugowetterberg/81747 to your computer and use it in GitHub Desktop.
A useful function for splitting ical content into 75-octet lines, taking multibyte characters into account. See: http://www.ietf.org/rfc/rfc2445.txt, section 4.1
<?php
mb_internal_encoding("UTF-8");
$desc = <<<TEXT
<p>Lines of text SHOULD NOT be longer than 75 octets, (och hör på den) excluding the line break. Long content lines SHOULD be split into a multiple line representations using a line "folding" technique.</p>
That is, a long line can be split between any two characters by inserting a CRLF
immediately followed by a single linear white space character (i.e.,
SPACE, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence
of CRLF followed immediately by a single linear white space character
is ignored (i.e., removed) when processing the content type.
TEXT;
/**
* Apply folding compliant with RFC 5545
* See https://www.rfc-editor.org/rfc/rfc5545#section-3.1
*
* @param string $preamble The property name, e.g. DESCRIPTION
* @param string $value The value for the property, e.g. a very long string
* @param bool $strip_tags Strip HTML tags from the value
*
* @return string Returns the folded string without the property name
*/
function ical_split($preamble, $value, $strip_tags=true)
{
$value = trim($value);
$value = preg_replace('/[\r\n]+/', ' ', $value);
$value = preg_replace('/\s{2,}/', ' ', $value);
if ($strip_tags) {
$value = strip_tags($value);
}
$value = $preamble . ':' . $value;
$offset = 0;
$chunkSize = 75;
$lines = [];
while ($line = mb_strcut($value, $offset, $chunkSize - 1)) {
$lines[] = $line;
$offset += $chunkSize;
}
return substr(join("\r\n\t", $lines), strlen($preamble) + 1);
}
$split = ical_split('DESCRIPTION:', $desc);
print 'DESCRIPTION:' . $split;
// Test results
$lines = preg_split('/\r\n/', 'DESCRIPTION:' . $split);
print "\n\nTests\n";
foreach ($lines as $i => $line) {
print "Line {$i}: " . strlen($line) . " octets\n";
}
print "\nAlt desc output:\n";
$split = ical_split('X-ALT-DESC:', $desc, false);
print 'X-ALT-DESC:' . $split;
print "\n\n";
DESCRIPTION:Lines of text SHOULD NOT be longer than 75 octets, (och hör
å den) excluding the line break. Long content lines SHOULD be split into
multiple line representations using a line "folding" technique. That is,
long line can be split between any two characters by inserting a CRLF imm
diately followed by a single linear white space character (i.e., SPACE, US
ASCII decimal 32 or HTAB, US-ASCII decimal 9). Any sequence of CRLF follow
d immediately by a single linear white space character is ignored (i.e., r
moved) when processing the content type.
Tests
Line 0: 73 octets
Line 1: 75 octets
Line 2: 75 octets
Line 3: 75 octets
Line 4: 75 octets
Line 5: 75 octets
Line 6: 75 octets
Line 7: 41 octets
Alt desc output:
X-ALT-DESC:<p>Lines of text SHOULD NOT be longer than 75 octets, (och hö
på den) excluding the line break. Long content lines SHOULD be split int
a multiple line representations using a line "folding" technique.</p> Tha
is, a long line can be split between any two characters by inserting a CR
F immediately followed by a single linear white space character (i.e., SPA
E, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence o
CRLF followed immediately by a single linear white space character is ign
red (i.e., removed) when processing the content type.
@keizie
Copy link

keizie commented Jun 3, 2011

mb_substr() count multibyte into one character and malfunction with a string with full of multibytes. mb_strcut() works well.

@hugowetterberg
Copy link
Author

@keize That's what taken into account at line #28, if the octet count (strlen) is bigger the available space, then $mbcc (multibyte character count) is decreased by the overflow and the mb_substr is attempted again. No line that has a octet count larger than 75 should ever get appended.

@sorenlouv
Copy link

Cool gist. However I think you need to escape commas:

$value = str_replace(',', ',', $value);

@ADoebeling
Copy link

Thank u very much for sharing that function with us, I've embedded it in a new cms @contao -Extension.
(Even if I currently had to disable it because of validation-problems)

@shoulders
Copy link

to be RFC complaint the octets must be

Lines of text SHOULD NOT be longer than 75 octets, excluding the line
break. Long content lines SHOULD be split into a multiple line
representations using a line "folding" technique. That is, a long
line can be split between any two characters by inserting a CRLF
immediately followed by a single linear white space character (i.e.,
SPACE, US-ASCII decimal 32 or HTAB, US-ASCII decimal 9). Any sequence
of CRLF followed immediately by a single linear white space character
is ignored (i.e., removed) when processing the content type.

taken from Internet Calendaring and Scheduling Core Object Specification

so change

return join($lines, "\n\t"); to return join($lines, "\r\n\t");

I have also used this as part of my .ics creation routine in my software QWcrm. I have tried to make my output all RFC compliant. Outputting a calendar event from Microsoft Outlook as an .ics helps.

Thanks for this script.

@djkgamc
Copy link

djkgamc commented Dec 29, 2018

Awesome! Thanks for this script. @keizie is right - you have to use mb_strcut - or else if you have a single very long multibyte, the code will loop and crash

@Giulo77
Copy link

Giulo77 commented Oct 22, 2019

Great job!
Honestly, I never had any problems creating ics files with more than 75 columns, but I also wanted to eliminate the warnings from the validation.
My problem is that I use the "X-ALT-DESC" tag to write the text of the description formatted with html commands and this function eliminates all the html commands, can you help me?

@viavario
Copy link

viavario commented Oct 6, 2022

@keizie That's what taken into account at line #28, if the octet count (strlen) is bigger the available space, then $mbcc (multibyte character count) is decreased by the overflow and the mb_substr is attempted again. No line that has a octet count larger than 75 should ever get appended.

As far as I understand from the RFC is that lines should be folded at a length of 75 characters including the property.
Depending on the way the ICS data is generated, your function might end up in an endless loop, particularly when using the while loop on line 28 if you have a long preamble or property.

As @keizie and @djkgamc commented, mb_strcut does exactly what we need:

If the cut position happens to be between two bytes of a multi-byte character, the cut is performed starting from the first byte of that character.

Although I'm in favor of properly applying the folding technique on multibyte strings, it should be noted that in section 3.1 of RFC 5545 the responsibility of supporting multibyte strings is put on the implementation of the unfolding technique instead of the folding technique:

Note: It is possible for very simple implementations to generate improperly folded lines in the middle of a UTF-8 multi-octet sequence. For this reason, implementations need to unfold lines in such a way to properly restore the original sequence.

Anyway, I believe the function can be improved to handle longer properties, as well as be more compliant with the RFC as @sqren suggested, and handle HTML in the X-ALT-DESC property as @Giulo77 requested:

/**
 * Apply folding compliant with RFC 5545
 * See https://www.rfc-editor.org/rfc/rfc5545#section-3.1
 * 
 * @param   string  $preamble   The property name, e.g. DESCRIPTION
 * @param   string  $value      The value for the property, e.g. a very long string
 *
 * @return  string              Returns the folded string without the property name
 */
function ical_split($preamble, $value)
{
    $value = trim($value);
    $value = preg_replace('/[\r\n]+/', ' ', $value);
    $value = preg_replace('/\s{2,}/', ' ', $value);

    $value = $preamble . ':' . $value;
    $offset = 0;
    $chunkSize = 75;
    $lines = [];
    while ($line = mb_strcut($value, $offset, $chunkSize - 1)) {
        $lines[] = $line;
        $offset += $chunkSize;
    }

    return substr(join("\r\n\t", $lines), strlen($preamble) + 1);
}

@hugowetterberg
Copy link
Author

Huh, 14 years... time flies :)
Your implementation looks nice and elegant @viavario. Stripping out tags should probably have been separate from the folding, but I added an optional param to your implementation that can be used to disable tag stripping, preserving the old behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment