Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A useful function for splitting ical content into 75-octet lines, taking multibyte characters into account. See: http://www.ietf.org/rfc/rfc2445.txt, section 4.1
<?php
mb_internal_encoding("UTF-8");
$desc = <<<TEXT
<p>Lines of text SHOULD NOT be longer than 75 octets, (och hör på den) excluding the line break. Long content lines SHOULD be split into a multiple line representations using a line "folding" technique.</p>
That is, a long line can be split between any two characters by inserting a CRLF
immediately followed by a single linear white space character (i.e.,
SPACE, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence
of CRLF followed immediately by a single linear white space character
is ignored (i.e., removed) when processing the content type.
TEXT;
function ical_split($preamble, $value) {
$value = trim($value);
$value = strip_tags($value);
$value = preg_replace('/\n+/', ' ', $value);
$value = preg_replace('/\s{2,}/', ' ', $value);
$preamble_len = strlen($preamble);
$lines = array();
while (strlen($value)>(75-$preamble_len)) {
$space = (75-$preamble_len);
$mbcc = $space;
while ($mbcc) {
$line = mb_substr($value, 0, $mbcc);
$oct = strlen($line);
if ($oct > $space) {
$mbcc -= $oct-$space;
}
else {
$lines[] = $line;
$preamble_len = 1; // Still take the tab into account
$value = mb_substr($value, $mbcc);
break;
}
}
}
if (!empty($value)) {
$lines[] = $value;
}
return join($lines, "\n\t");
}
$split = ical_split('DESCRIPTION:', $desc);
print 'DESCRIPTION:' . $split;
// Test results
$lines = preg_split('/\n/', 'DESCRIPTION:' . $split);
print "\n\nTests\n";
foreach ($lines as $i => $line) {
print "Line {$i}: " . strlen($line) . " octets\n";
}
DESCRIPTION:Lines of text SHOULD NOT be longer than 75 octets, (och hör p
å den) excluding the line break. Long content lines SHOULD be split into
a multiple line representations using a line "folding" technique. That is,
a long line can be split between any two characters by inserting a CRLF i
mmediately followed by a single linear white space character (i.e., SPACE,
US-ASCII decimal 32 or HTAB, US-ASCII decimal 9). Any sequence of CRLF fo
llowed immediately by a single linear white space character is ignored (i.
e., removed) when processing the content type.
Tests
Line 0: 74 octets
Line 1: 75 octets
Line 2: 75 octets
Line 3: 75 octets
Line 4: 75 octets
Line 5: 75 octets
Line 6: 75 octets
Line 7: 47 octets
@keizie

This comment has been minimized.

Copy link

keizie commented Jun 3, 2011

mb_substr() count multibyte into one character and malfunction with a string with full of multibytes. mb_strcut() works well.

@hugowetterberg

This comment has been minimized.

Copy link
Owner Author

hugowetterberg commented Jun 13, 2011

@keize That's what taken into account at line #28, if the octet count (strlen) is bigger the available space, then $mbcc (multibyte character count) is decreased by the overflow and the mb_substr is attempted again. No line that has a octet count larger than 75 should ever get appended.

@sqren

This comment has been minimized.

Copy link

sqren commented Mar 13, 2013

Cool gist. However I think you need to escape commas:

$value = str_replace(',', ',', $value);

@ADoebeling

This comment has been minimized.

Copy link

ADoebeling commented Apr 12, 2016

Thank u very much for sharing that function with us, I've embedded it in a new cms @contao -Extension.
(Even if I currently had to disable it because of validation-problems)

@shoulders

This comment has been minimized.

Copy link

shoulders commented Jan 3, 2017

to be RFC complaint the octets must be

Lines of text SHOULD NOT be longer than 75 octets, excluding the line
break. Long content lines SHOULD be split into a multiple line
representations using a line "folding" technique. That is, a long
line can be split between any two characters by inserting a CRLF
immediately followed by a single linear white space character (i.e.,
SPACE, US-ASCII decimal 32 or HTAB, US-ASCII decimal 9). Any sequence
of CRLF followed immediately by a single linear white space character
is ignored (i.e., removed) when processing the content type.

taken from Internet Calendaring and Scheduling Core Object Specification

so change

return join($lines, "\n\t"); to return join($lines, "\r\n\t");

I have also used this as part of my .ics creation routine in my software QWcrm. I have tried to make my output all RFC compliant. Outputting a calendar event from Microsoft Outlook as an .ics helps.

Thanks for this script.

@djkgamc

This comment has been minimized.

Copy link

djkgamc commented Dec 29, 2018

Awesome! Thanks for this script. @keizie is right - you have to use mb_strcut - or else if you have a single very long multibyte, the code will loop and crash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.