Skip to content

Instantly share code, notes, and snippets.

@jzwinck
Created June 2, 2013 03:29
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jzwinck/5692534 to your computer and use it in GitHub Desktop.
Save jzwinck/5692534 to your computer and use it in GitHub Desktop.
Inspired by http://linuxnote.net/jianingy/en/linux/a-fast-way-to-remove-huge-number-of-files.html this program removes all regular files within a directory, using multiple processes to work faster. Timings from my system are in a comment below; feel free to leave your own. Easily create lots of empty files with "seq 10000 | xargs touch".
#include <dirent.h>
#include <stdio.h>
#include <unistd.h>
/* filter for regular files only */
static int dirent_select(const struct dirent* ent)
{
return ent->d_type == DT_REG;
}
/* goes to the directory in argv[1] and removes all regular files within */
int main(int argc, char* argv[])
{
if (argc != 2) {
fprintf(stderr, "directory to delete from is required\n");
return 1;
}
int res = chdir(argv[1]);
if (res) {
perror("chdir");
return 1;
}
/* make the list of files to delete */
struct dirent** list;
int count = scandir(".", &list, dirent_select, NULL);
if (count < 0) {
perror("scandir");
return 1;
}
/* fork twice to become four processes total */
pid_t pid1 = fork();
pid_t pid2 = fork();
if (pid1 < 0 || pid2 < 0) {
perror("fork");
return 1;
}
/* figure out who is responsible for which files (one case per process) */
int begin, end;
if (pid1 == 0 && pid2 == 0) {
begin = 0;
end = count / 4;
} else if (pid2 == 0) {
begin = count / 4;
end = count / 2;
} else if (pid1 == 0) {
begin = count / 2;
end = count * 3 / 4;
} else {
begin = count * 3 / 4;
end = count;
}
/* now delete the files this process is responsible for */
int ii;
for (ii = begin; ii < end; ++ii) {
res = unlink(list[ii]->d_name);
if (res) {
perror("unlink");
return 1;
}
}
return 0;
}
@jzwinck
Copy link
Author

jzwinck commented Jun 2, 2013

On a 2011 Macbook Air (OS X 10.8.3, 128 GB SSD, 1.6 GHz), clearing a directory of 100k empty files takes 11.3 seconds with rsync --delete, vs. 4.3 seconds with this program. The relative gains are less with smaller numbers of files, but still measurable with 10k files (about 0.64 vs. 0.48 seconds). I tried 2-way and 8-way parallelism as well, but found 4-way to be the best fit, at least on this 2x2 (dual-core, hyper-threaded) system.

@jzwinck
Copy link
Author

jzwinck commented Jun 2, 2013

If you want to try 8-way parallelism, here's the conditional block you need (along with an extra pid3 = fork() of course):

int tmp = (  (pid1 == 0 ? 0 : 4)
       + (pid2 == 0 ? 0 : 2)
       + (pid3 == 0 ? 0 : 1));
begin = count * tmp / 8;
end = count * (tmp+1) / 8;

@Deleriux1
Copy link

http://fpaste.org/16330/28909613/

Your program:

$ time ./dothis TestDir/

real    0m36.350s
user    0m0.057s
sys 0m3.831s

My program:

$ time ./killdir TestDir/
Total files: 1000000
Performing delete..
Done

real    0m16.713s
user    0m1.140s
sys 0m6.273s

However, this only works on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment