Google Summer of Code 2022
Basic Information
- Name: Chen Liang-Chun
- Email: featherclc@gmail.com
- GitHub: featherchen
- Project: Support sparse ghosts
- Organization: CRIU
- Mentor: Pavel Tikhomirov
Project Abstract
Nowadays, CRIU will save the ghost file by using a lot of system calls to determine where the chunks are, which is very expensive, especially for highly sparse files, so this project aims to improve the solution for dumping sparse ghosts in CRIU.
Task
large ghost sparse file support
CRIU checked the size of sparse ghosts with st_size
, which shows the file length not the actual file size(disk size), so the check can't deal with a large sparse ghost file.
In this PR, I fixed this issue by replacing st_size
with st_blocks * 512
, which shows the actual size of a file.
Moreover, I created two tests related to large sparse ghost files.
ghost_holes_large00
is a test which creates a large ghost sparse file with 1GiB hole(pwrite
can only handle 2GiB maximum on a 32-bit system) and 8KiB data, CRIU should be able to handle this kind of situation.
ghost_holes_large01
is a test which creates a large ghost sparse file with 1GiB hole and 2MiB data, since 2MiB is larger than the default ghost_limit(1MiB), CRIU should fail on this test.
Improve copy file algorithm in dump_ghost_file
In this PR, we want to use fiemap
to reduce the overhead when we keep using SEEK_HOLE/SEEK_DATA
to discover where the chunks are.
First, we should take a simple test to check whether fiemap
is faster than using SEEK_HOLE/SEEK_DATA
. Thus we write a program (compare.c) which simulates the dumping algorithm of CRIU. Below is the result.
gcc compare.c -o compare
./compare testfile $((1<<25))
size: 33554432
actual size: 16777216
fiemap time = 2040 us
SEEK time = 7161 us
./compare testfile $((1<<26))
size: 67108864
actual size: 33554432
fiemap time = 4239 us
SEEK time = 14474 us
./compare testfile $((1<<27))
size: 134217728
actual size: 67108864
fiemap time = 8529 us
SEEK time = 30161 us
./compare testfile $((1<<28))
size: 268435456
actual size: 134217728
fiemap time = 17161 us
SEEK time = 58791 us
This shows that in the general file size, the result is the same as we expected.
Next, we need to deal with the problem of sendfile
return value. sendfile
will return 0 in some normal case, e.g. in the middle of a partially
full block, and it will return -1 when errors occur.
Here is an example:
echo test > testfile
sync
filefrag -v testfile
Filesystem type is: 9123683e
File size of testfile is 5 (1 block of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 4095: 0.. 4095: 4096: last,not_aligned,inline,eof
testfile: 1 extent found
touch testcopy
xfs_io -c 'sendfile -i testfile 0 4096' testcopy
sent 5/4096 bytes at offset 0
5,000000 bytes, 1 ops; 0.0000 sec (180,845 KiB/sec and 37037,0370 ops/sec)
xfs_io -c 'sendfile -i testfile 5 4091' testcopy
sent 0/4091 bytes at offset 5
0,000000 bytes, 0 ops; 0.0000 sec (0,000000 bytes/sec and 0,0000 ops/sec)
In the original dumping algorithm, we know the exact size that we want to copy, thus we can see return 0
as an error. However fiemap
returns page-alligned extents, so We should separate these two situations(return 0
& return -1
) for the new dumping algorithm(using fiemap
)
Next, based on https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/create_inode.c#n519, I developed a new algorithm for dumping chunks via fiemap
(copy_file_to_chunks_fiemap
).
Also, I created another BOOL_OPT
for users to determine which algorithm they want to use. Moreover, for those filesystem not supporting fiemap
, CRIU
will fall back to the original algorithm(SEEK_HOLE/SEEK_DATA
).
Furthermore I add tests that create a ghost file with a lot of holes, there are 4K data and 4K hole inside every 8K length. This should test the behavior of the original and new dumping algorithm of CRIU.
Future Work
zdtm test
for fallback We should add the test to check whether CRIU falls back to the original algorithm properly when the file system doesn't supportfiemap
.- measure performance of the original algorithm vs new algorithm Although we expect the new algorithm will have better performance in some extreme situations(a lot of holes), it is also needed to measure how much improvement we have achieved.
Remark
- During the development, Pavel and I have found an issue that
fiemap
is acting extremely slow onbtrfs
on files with multiple extents. Pavel have sent kernel community a report, and they have developed a patchset to fixed this issue. - When developing the test of highly sparse ghost, we discovered that in some file system, such as
xfs
, we somehow can not easily create highly sparse file as inext4
orbtrfs
. Although we can usefallocate
to forcibly create holes, we still don't know the reason whyxfs
is acting like this.
Acknowledgment
I'd like to thank my mentor, Pavel Tikhomirov, he is very supportive in the whole GSoC period. I have encountered a lot of problems during development, and he can always give me the fast feedback and suitable solution with so much patience.
Besides, I want to thank CRIU’s community and GSoC for giving me the opportunity to be a part of this project. This is a wonderful experience, and I would like to keep contributing to CRIU in the future.