Skip to content

Instantly share code, notes, and snippets.

@featherchen
Last active November 6, 2022 17:58
Show Gist options
  • Save featherchen/cef5b3d7b1ea09c2497a43332ee0e946 to your computer and use it in GitHub Desktop.
Save featherchen/cef5b3d7b1ea09c2497a43332ee0e946 to your computer and use it in GitHub Desktop.
Google Summer of Code 2022

Google Summer of Code 2022

Basic Information

Project Abstract

Nowadays, CRIU will save the ghost file by using a lot of system calls to determine where the chunks are, which is very expensive, especially for highly sparse files, so this project aims to improve the solution for dumping sparse ghosts in CRIU.

Task

CRIU checked the size of sparse ghosts with st_size, which shows the file length not the actual file size(disk size), so the check can't deal with a large sparse ghost file.

In this PR, I fixed this issue by replacing st_size with st_blocks * 512, which shows the actual size of a file.

Moreover, I created two tests related to large sparse ghost files. ghost_holes_large00 is a test which creates a large ghost sparse file with 1GiB hole(pwrite can only handle 2GiB maximum on a 32-bit system) and 8KiB data, CRIU should be able to handle this kind of situation. ghost_holes_large01 is a test which creates a large ghost sparse file with 1GiB hole and 2MiB data, since 2MiB is larger than the default ghost_limit(1MiB), CRIU should fail on this test.

In this PR, we want to use fiemap to reduce the overhead when we keep using SEEK_HOLE/SEEK_DATA to discover where the chunks are.

First, we should take a simple test to check whether fiemap is faster than using SEEK_HOLE/SEEK_DATA. Thus we write a program (compare.c) which simulates the dumping algorithm of CRIU. Below is the result.

gcc compare.c -o compare 

./compare testfile $((1<<25))
size: 33554432
actual size: 16777216
fiemap time = 2040 us
SEEK time = 7161 us

./compare testfile $((1<<26))
size: 67108864
actual size: 33554432
fiemap time = 4239 us
SEEK time = 14474 us

./compare testfile $((1<<27))
size: 134217728
actual size: 67108864
fiemap time = 8529 us
SEEK time = 30161 us

./compare testfile $((1<<28))       
size: 268435456
actual size: 134217728
fiemap time = 17161 us
SEEK time = 58791 us

This shows that in the general file size, the result is the same as we expected.

Next, we need to deal with the problem of sendfile return value. sendfile will return 0 in some normal case, e.g. in the middle of a partially full block, and it will return -1 when errors occur.

Here is an example:

echo test > testfile

sync

filefrag -v testfile
Filesystem type is: 9123683e
File size of testfile is 5 (1 block of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..    4095:          0..      4095:   4096:             last,not_aligned,inline,eof
testfile: 1 extent found

touch testcopy

xfs_io -c 'sendfile -i testfile 0 4096' testcopy
sent 5/4096 bytes at offset 0
5,000000 bytes, 1 ops; 0.0000 sec (180,845 KiB/sec and 37037,0370 ops/sec)

xfs_io -c 'sendfile -i testfile 5 4091' testcopy
sent 0/4091 bytes at offset 5
0,000000 bytes, 0 ops; 0.0000 sec (0,000000 bytes/sec and 0,0000 ops/sec)

In the original dumping algorithm, we know the exact size that we want to copy, thus we can see return 0 as an error. However fiemap returns page-alligned extents, so We should separate these two situations(return 0 & return -1) for the new dumping algorithm(using fiemap)

Next, based on https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/create_inode.c#n519, I developed a new algorithm for dumping chunks via fiemap(copy_file_to_chunks_fiemap).

Also, I created another BOOL_OPT for users to determine which algorithm they want to use. Moreover, for those filesystem not supporting fiemap, CRIU will fall back to the original algorithm(SEEK_HOLE/SEEK_DATA).

Furthermore I add tests that create a ghost file with a lot of holes, there are 4K data and 4K hole inside every 8K length. This should test the behavior of the original and new dumping algorithm of CRIU.

Future Work

  • zdtm test for fallback We should add the test to check whether CRIU falls back to the original algorithm properly when the file system doesn't support fiemap.
  • measure performance of the original algorithm vs new algorithm Although we expect the new algorithm will have better performance in some extreme situations(a lot of holes), it is also needed to measure how much improvement we have achieved.

Remark

  • During the development, Pavel and I have found an issue that fiemap is acting extremely slow on btrfs on files with multiple extents. Pavel have sent kernel community a report, and they have developed a patchset to fixed this issue.
  • When developing the test of highly sparse ghost, we discovered that in some file system, such as xfs, we somehow can not easily create highly sparse file as in ext4 or btrfs. Although we can use fallocate to forcibly create holes, we still don't know the reason why xfs is acting like this.

Acknowledgment

I'd like to thank my mentor, Pavel Tikhomirov, he is very supportive in the whole GSoC period. I have encountered a lot of problems during development, and he can always give me the fast feedback and suitable solution with so much patience.

Besides, I want to thank CRIU’s community and GSoC for giving me the opportunity to be a part of this project. This is a wonderful experience, and I would like to keep contributing to CRIU in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment