Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
EnumerateFiles filtering on multiple extensions -- performance comparison of various methods. From discussion at http://stackoverflow.com/questions/163162/can-you-call-directory-getfiles-with-multiple-filters
public static class LinqPadExtensions {
/// <summary> Performance check -- how long do X repetitions of a task take?</summary>
public static TimeSpan Perf(this string reportTitle, Action<int> task, int repetitions = 10000, bool noShow = false) {
// http://stackoverflow.com/questions/28637/is-datetime-now-the-best-way-to-measure-a-functions-performance
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < repetitions; i++) {
task(i);
}
sw.Stop();
if( !noShow ) string.Format("{0} ticks elapsed ({1} ms)", sw.Elapsed.Ticks, sw.Elapsed.TotalMilliseconds).Dump(string.Format("{0} ({1}x)", reportTitle, repetitions));
return sw.Elapsed;
}
}

.net 3.5 where

IEnumerable (5 items)

  • C:\Temp\IOTest\Input\a text file.txt
  • C:\Temp\IOTest\Input\another readme.MD
  • C:\Temp\IOTest\Input\README.md
  • C:\Temp\IOTest\Input\test text 1.txt
  • C:\Temp\IOTest\Input\test text 2.txt

where

IEnumerable (5 items)

  • C:\Temp\IOTest\Input\a text file.txt
  • C:\Temp\IOTest\Input\another readme.MD
  • C:\Temp\IOTest\Input\README.md
  • C:\Temp\IOTest\Input\test text 1.txt
  • C:\Temp\IOTest\Input\test text 2.txt

split

IEnumerable (5 items)

  • C:\Temp\IOTest\Input\another readme.MD
  • C:\Temp\IOTest\Input\README.md
  • C:\Temp\IOTest\Input\a text file.txt
  • C:\Temp\IOTest\Input\test text 1.txt
  • C:\Temp\IOTest\Input\test text 2.txt

glob

IEnumerable (5 items)

  • C:\Temp\IOTest\Input\another readme.MD
  • C:\Temp\IOTest\Input\README.md
  • C:\Temp\IOTest\Input\a text file.txt
  • C:\Temp\IOTest\Input\test text 1.txt
  • C:\Temp\IOTest\Input\test text 2.txt

Same Results? (.net35 vs 4)

True

Same Results? (where vs split)

True

Same Results? (where vs glob)

True

.net 3.5 where (10000x)

9026333 ticks elapsed (902.6333 ms)

where (10000x)

4360029 ticks elapsed (436.0029 ms)

split (10000x)

33343 ticks elapsed (3.3343 ms)

glob (10000x)

7138 ticks elapsed (0.7138 ms)


where (tolist) (10000x)

9699163 ticks elapsed (969.9163 ms)

split (tolist) (10000x)

13859623 ticks elapsed (1385.9623 ms)

glob (tolist) (10000x)

13677417 ticks elapsed (1367.7417 ms)

void Main()
{
// @source: http://stackoverflow.com/questions/163162/can-you-call-directory-getfiles-with-multiple-filters
var path = @"C:\Temp\IOTest\Input";
var exts = new[] { "md", "txt" };
var extsAsWildcards = exts.Select(x => "*." + x).ToArray();
var getlist = getfiles(path, exts).Dump(".net 3.5 where");
var wherelist = getwhere(path, exts).Dump("where");
var splitlist = splitwhere(path, exts).Dump("split");
var globlist = globwhere(path, extsAsWildcards).Dump("glob");
wherelist.OrderBy(o => o).SequenceEqual(getlist.OrderBy(o => o)).Dump("Same Results? (.net35 vs 4)");
wherelist.OrderBy(o => o).SequenceEqual(splitlist.OrderBy(o => o)).Dump("Same Results? (where vs split)");
wherelist.OrderBy(o => o).SequenceEqual(globlist.OrderBy(o => o)).Dump("Same Results? (where vs glob)");
".net 3.5 where".Perf(n => getfiles(path, exts));
"where".Perf(n => getwhere(path, exts));
"split".Perf(n => splitwhere(path, exts));
"glob".Perf(n => globwhere(path, extsAsWildcards));
"where (tolist)".Perf(n => getwhere(path, exts).ToList());
"split (tolist)".Perf(n => splitwhere(path, exts).ToList());
"glob (tolist)".Perf(n => globwhere(path, extsAsWildcards).ToList());
}
public IEnumerable<string> getfiles(string path, params string[] exts) {
return
Directory
.GetFiles(path, "*.*")
.Where(file => exts.Any(x => file.EndsWith(x, StringComparison.OrdinalIgnoreCase)));
}
public IEnumerable<string> getwhere(string path, params string[] exts) {
return
Directory
.EnumerateFiles(path, "*.*")
.Where(file => exts.Any(x => file.EndsWith(x, StringComparison.OrdinalIgnoreCase)));
}
public IEnumerable<string> splitwhere(string path, params string[] exts) {
return
exts.Select(x => "*." + x)
.SelectMany(x =>
Directory.EnumerateFiles(path, x)
);
}
public IEnumerable<string> globwhere(string path, params string[] globs) {
return
globs
.SelectMany(x =>
Directory.EnumerateFiles(path, x)
);
}
@mnemanov

This comment has been minimized.

Copy link

@mnemanov mnemanov commented Nov 22, 2017

The difference between split and glob is whether you do "exts.Select(x => "*." + x)" in the test loop or outside of it. so the difference is only in testing not in real life. (unless you will do a loop with the SAME extensions in real life).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment