Skip to content

Instantly share code, notes, and snippets.

@bantu

bantu/report.txt Secret

Last active August 29, 2015 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bantu/886ac680b0aef5812f71 to your computer and use it in GitHub Desktop.
Save bantu/886ac680b0aef5812f71 to your computer and use it in GitHub Desktop.
A feasibility analysis of full large file support for ownCloud 32-bit servers
A feasibility analysis of full large file support for ownCloud 32-bit servers
=============================================================================
File sizes as integers
----------------------
In theory, the size of a file is an integer representing the number of bytes
the file is long. In practice, this integer has to be put into a data type
which can only take a certain amount of (different) values and has a certain
precision.
In PHP, the return data type of the filesize() function (which is basically a
wrapper around the stat system call) is int. The size of the data type int is
platform-dependent, it is for example 64 bits on amd64 and 32 bits on x86 and
arm (Raspberry Pi).
As such, on 32-bit platforms, it is not possible to reliably determine the size
of a file, if it is larger than 2^32 bytes = 4 GiB.
Remark: Because the data type int is a signed integer type, the filesize of
files of size between 2 GiB and 4 GiB will be reported as negative when the
return value is converted to a string. The correct string result can be
produced by interpreting the return value as an unsigned integer using
sprintf('%u', ...).
In general, it is not only impossible to determine the size of a large file, it
is impossible to handle these large files at all. This is because interally
the int data type is used as well. Thus, in general, it is even impossible to
open a file using fopen().
Enabling large file support (LFS)
---------------------------------
To overcome the limitations of not being able to handle large files at all, PHP
can be compiled with CFLAGS='-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' which
enables Large File Support. This will make PHP use new operating system
interfaces internally which in turn will allow fopen() and other generic
filesystem related function calls to succeed.
Remark: Some distributions seem to enable these flags, some do not.
Remark: Debian-based distributions seem to suffix the extension_dir with "+lfs"
when LFS support is enabled. This can be checked with "php -i | grep lfs".
Remark: This explains why different people report different behaviour on 32 bit
systems.
Remaining problems, despite LFS
-------------------------------
Despite fixing some PHP issues with large files, compiling with LFS does
not fix all problems PHP has with large files. Because the return type of
filesize() is still int and an int is still 32 bits, the return value can not
be possibly correct for large files. In fact, it seems to hold that
return value = actual filesize (mod 2^32),
i.e. the filesize just "wraps around" at 2^32.
Remark: The E_WARNING error is gone, but the reported filesize is incorrect.
Remark: This implies that if a method of determining the correct filesize is
discovered, it must be used unconditionally (in contrast to only when
filesize() returns a negative integer, as some have claimed) on 32 bit systems.
Besides filesize() and more importantly, fseek() may not work on large files,
because the internal offset parameter has basically the same type as the
external offset parameter, which is an int. This is documented in PHP bug
#40726.
There is a collection of tests at https://github.com/bantu/php-large-files
that try to find out which functions are working with large files and which do
not.
Representing file sizes with floats
-----------------------------------
The float data type can be used to represent integers too. Although strictly
speaking the length of the float data type is also platform-dependent, it seems
to be IEEE 754 binary64 "double precision" on most (if not all) platforms. For
integer representation 52 of the 64 bits can be used. Adding the implicit bit
that does not need to be stored explicitly, this means that 2^53 integers can
be represented without gap using the float data type. Interpreted as file size,
this yields a maximum file size of
2^53 bytes = 2^3 * 2^50 bytes = 8 PiB = 8192 TiB
which is a massive improvement compared to 2^32.
Care must be taken when performing float to string conversion, both, when
performing explicit string casting and when performing implicit string casting
via concatenation (e.g. when constructing Content-Length HTTP header). Because
floats are by default only printed with a certain decimal precision (which can
be configured via the php.ini setting 'precision', default 14), they will be
converted to scientific notation at some point.
For example
var_dump((string) pow(2, 52));
yields
string(19) "4.5035996273705E+15"
on 32-bit platforms where 2^52 does not fit into the int data type and pow()
thus returns a float.
The expected result
string(16) "9007199254740992"
can be generated as follows:
ini_set('precision', 16); // where 16 = ceil(log(2^53) / log(10))
var_dump((string) pow(2, 52));
or
var_dump(number_format(pow(2, 52), 0, '', ''));
Remark: Without using number_format() or increasing the precision, scientific
notation kicks in at 10^14 bytes > 2^46 bytes = 64 TiB assuming the default
precision of 14 is not changed.
Workarounds for getting the correct file size
---------------------------------------------
The correct file size can be fetched as a base-10 string using:
- PHP curl extension
- DOM (windows)
- exec()
- ...
Casting to number
-----------------
The PHP pseudo-type number is defined as int|float.
A variable $var can be casted to number as follows:
0 + $var
The result will be an integer if the number fits into an integer, and a float
otherwise.
User aspects and their technical background
-------------------------------------------
- Having directories with files < 2 GiB, but sum of all files > 2 GiB display
correctly.
This seems to be working in master. The sum of all files in the current
folder seems to be determined via JavaScript only, but the total size of
directories in the current folder seems to be correct too, although int
castings are used in a filesize detection related places.
- Having the usage indicator in 'Personal' show the used storage correctly.
This seems to currently not work in master. This is probably due to int
casting of the used storage. Should cast to number instead.
- Ability to download files larger than 2 GiB via the web interface.
When LFS support is present, fopen() and fread() seem to work correctly and
fread() is able to completely read the file. The client does not want to read
the whole file because an incorrect Content-Length is advertised. When the
Content-Length header is removed, the whole file is correctly delivered.
Fixing this thus involves:
- Determining the actual filesize correctly (e.g. via curl, see PR #5365).
- Advertising the correct Content-Length (e.g. by using floats).
- Ability to upload files larger than 2 GiB via the ownCloud client.
The ownCloud client uses PUT requests and sends chunks of size 10 MiB.
The server has to put the files together.
This is probably done by using fopen() in append mode followed by fwrite()
which seems to work correctly when LFS support is present.
- Ability to upload files larger than 2 GiB via the web interface.
This either involves chunking as per the above and should thus does work.
Or it involves sending one large request which would probably be subject to
post_max_size and upload_max_filesize which are limited to 2047M due to an
int being used internally.
Although this problem will probably be addressed in PHP 5.6 (see PHP Ticket
#65944), the number of bytes that can practically be uploaded at once is
still constrained by various timeouts.
- Ability to download ZIP archives having a total size > 2 GiB.
There is a 4 GiB limit anyway, if ZIP64 is not implemented.
See: http://en.wikipedia.org/wiki/Zip_%28file_format%29#ZIP64
Conclusions
===========
- It is possible to support large files on 32-bit platforms if PHP was built
with Large File Support (LFS).
- The curl PHP extension is probably the fastest way to determine the size of
files correctly on 32-bit platforms. Users of 32-bit platforms should thus
make sure it is installed.
- Users on 32-bit platforms without Large File Support (LFS) are completely out
of luck. There is no way we can support files larger than 4 GiB. Depending
on the operations used, this limit may even decrease to 2 GiB.
- Using workarounds to get the file size (e.g. curl) on 32-bit platforms
without LFS is pointless as we will not be able to handle these files anyway.
It is however unclear how to reliably detect whether LFS is supported or not.
- The webinterface should use chunking (if it does not already) just like the
ownCloud client does, thus not only overcoming limits implied by
upload_max_filesize, but also overcoming limits related to the various
timeouts.
- When converting file sizes to string, they should be run through a formatting
function that removes the effect of the php.ini setting 'precision'.
- If large ZIP downloads are desired, ZIP64 support must be checked and
implemented if not present.
- If large ZIP downloads are not desired, users must be prevented from creating
large ZIP files as they may be broken.
Appendix
========
Known Distributions
-------------------
The following distributions and platforms are known for having LFS enabled:
- Debian Linux 7 Wheezy i386
Tested version: PHP 5.4.4-14+deb7u7 (cli)
- PHP 5.4 packages built by Dotdeb.org for Debian Linux 7 Wheezy
Tested version: PHP 5.4.25-1~dotdeb.1 (cli)
- Raspian (arm, as used on the Raspberry Pi device)
Tested version: PHP 5.4.4-14+deb7u7 (cli)
The following distributions and platforms are known for not having LFS:
- Ubuntu Linux 13.10 Saucy i386
Tested version: PHP 5.5.3-1ubuntu2.1 (cli)
Reported here: https://bugs.launchpad.net/ubuntu/+source/php5/+bug/1280044
- PHP 5.5 packages built by Dotdeb.org for Debian Linux 7 Wheezy i386
Tested version: PHP 5.5.9-1~dotdeb.1 (cli)
The following distributions and platforms are believed to have LFS enabled:
- Ubuntu Linux Precise 12.04 i386
Based on the extension directory having the +lfs suffix.
See http://packages.ubuntu.com/precise/i386/php5-common/filelist
- Ubuntu Linux Quantal 12.10 i386
Based on the extension directory having the +lfs suffix.
See http://packages.ubuntu.com/quantal/i386/php5-common/filelist
- Ubuntu Linux Raring 13.04 i386
Based on the extension directory having the +lfs suffix.
See http://packages.ubuntu.com/raring/i386/php5-common/filelist
- Debian Linux 6 Squeeze i386
Based on the extension directory having the +lfs suffix.
See https://packages.debian.org/squeeze/i386/php5-common/filelist
The following distributions and platforms are believed to not have LFS:
- Debian Linux 8 Jessie i386
Based on the extension directory not having the +lfs suffix.
See https://packages.debian.org/jessie/i386/php5-common/filelist
Reported here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=738984
Related ownCloud tickets
------------------------
PR #5365 Add LargeFileHelper / Add CURL filesize workaround / Fix some 32-bit
filesize issues
https://github.com/owncloud/core/pull/5365
This is still work in progress,
see https://github.com/owncloud/core/issues/assigned/bantu?state=open
in the meantime
Related PHP tickets
-------------------
#27792 [PATCH] Functions fail on large files (filesize,is_file,is_dir,readdir)
https://bugs.php.net/bug.php?id=27792
#40726 fseek / ftell do not work correctly for files > 2GB
https://bugs.php.net/bug.php?id=40726
#65944 File uploads over 2GB fail
https://bugs.php.net/bug.php?id=65944
@DeepDiver1975
Copy link

FYI: looks like this issue is unresolved on Windows as well http://marc.info/?l=php-internals&m=137002754604365&w=2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment