Skip to content

Instantly share code, notes, and snippets.

@CharlieDigital
Last active March 13, 2024 18:15
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save CharlieDigital/51e7457a01c5ade7c172771bdaf82e1c to your computer and use it in GitHub Desktop.
Save CharlieDigital/51e7457a01c5ade7c172771bdaf82e1c to your computer and use it in GitHub Desktop.
Comparing Parallel.For vs Channels
// Generate a set of 100 records, each with a random wait interval.
using System.Collections.Immutable;
using System.Diagnostics;
using System.Threading.Channels;
var log = (object msg) => Console.WriteLine(msg);
var workload = Enumerable
.Range(0, 100)
.Select(i => (Index: i, Delay: Random.Shared.Next(10, 50)))
.ToImmutableArray();
// Using System.Threading.Channels
await InstrumentedRun("Channel", async () => {
var channel = Channel.CreateUnbounded<int>();
async Task Run(ChannelWriter<int> writer, int id, int delay) {
await Task.Delay(delay);
await writer.WriteAsync(id);
}
async Task Receive(ChannelReader<int> reader) {
while (await reader.WaitToReadAsync()) {
if (reader.TryRead(out var id)) {
// No work here.
//log($" Completed {id}");
}
}
}
var receiveTask = Receive(channel.Reader);
var processingTasks = workload
.AsParallel()
.Select(e => Run(channel.Writer, e.Index, e.Delay));
await Task
.WhenAll(processingTasks)
.ContinueWith(_ => channel.Writer.Complete());
await receiveTask;
});
// Using Parallel.For with concurrency of 4
await InstrumentedRun("Parallel.For @ 4", () => {
Parallel.For(0, 100, new ParallelOptions { MaxDegreeOfParallelism = 4 }, (index) => {
Thread.Sleep(workload[index].Delay);
});
return Task.CompletedTask;
});
// Using Parallel.ForEachAsync with concurrency of 4
await InstrumentedRun("Parallel.ForEachAsync @ 4", async () =>
await Parallel.ForEachAsync(workload, new ParallelOptions { MaxDegreeOfParallelism = 4 }, async (item, cancel) => {
await Task.Delay(item.Delay, cancel);
})
);
// Using Parallel.ForEachAsync with concurrency of 40
await InstrumentedRun("Parallel.ForEachAsync @ 40", async () =>
await Parallel.ForEachAsync(workload, new ParallelOptions { MaxDegreeOfParallelism = 40 }, async (item, cancel) => {
await Task.Delay(item.Delay, cancel);
})
);
// Using Parallel.ForEachAsync with concurrency unset
await InstrumentedRun("Parallel.ForEachAsync (Default)", async () =>
await Parallel.ForEachAsync(workload, async (item, cancel) => {
await Task.Delay(item.Delay, cancel);
})
);
/*-----------------------------------------------------------
* Supporting functions
---------------------------------------------------------*/
async Task InstrumentedRun(string name, Func<Task> test) {
var threadsAtStart = Process.GetCurrentProcess().Threads.Count;
var timer = new Stopwatch();
timer.Start();
await test();
timer.Stop();
Console.WriteLine($"[{name}] = {timer.ElapsedMilliseconds}ms");
Console.WriteLine($" ⮑ {threadsAtStart} threads at start");
Console.WriteLine($" ⮑ {Process.GetCurrentProcess().Threads.Count} threads at end");
}
/*
YMMV since each run uses a random workload.
[Channel] = 68ms
⮑ 8 threads at start
⮑ 19 threads at end
[Parallel.For @ 4] = 799ms
⮑ 19 threads at start
⮑ 19 threads at end
[Parallel.ForEachAsync @ 4] = 754ms
⮑ 19 threads at start
⮑ 19 threads at end
[Parallel.ForEachAsync @ 40] = 100ms
⮑ 19 threads at start
⮑ 19 threads at end
[Parallel.ForEachAsync (Default)] = 384ms
⮑ 19 threads at start
⮑ 19 threads at end
*/
@AliveDevil
Copy link

Just ran these on an i5-8400 (6 Cores, 6 Threads) with these changes: https://gist.github.com/AliveDevil/62de8d4ccffd5f86980c1db8601973cd
(Configuration: dotnet publish -f net7.0 -c Release -r win-x64 --self-contained -p:PublishNativeAot=true -p:PublishSingleFile=true -p:PublishTrimmed=true)

Results:

> @("Channels", "ParallelFor4", "ParallelForEach4", "ParallelForEach40", "ParallelForEach") | % { .\ParallelForVsChannel.exe $_ }
[Channel] = 204ms
  ?  8 threads at start
  ?  17 threads at end
[Parallel.For @ 4] = 1039ms
  ?  8 threads at start
  ?  15 threads at end
[Parallel.ForEachAsync @ 4] = 1010ms
  ?  8 threads at start
  ?  15 threads at end
[Parallel.ForEachAsync @ 40] = 156ms
  ?  8 threads at start
  ?  16 threads at end
[Parallel.ForEachAsync (Default)] = 695ms
  ?  8 threads at start
  ?  16 threads at end

ForEachAsync @40 still beats Channels (in case of cold start) - with deterministic results.

@CharlieDigital
Copy link
Author

CharlieDigital commented Oct 5, 2023

Very interesting results.

The AOT compilation optimizes scheduling somehow?

(Also, I'm on a 2021 M1 MacBook Pro (6+2, 16 GB), .NET 7)

@AliveDevil
Copy link

AliveDevil commented Oct 5, 2023

It does. Reran this (this time on my Laptop, i5-7200U, 2 core, 4 threads) and on Linux.

publish$ for mode in "Channels" "ParallelFor4" "ParallelForEach4" "ParallelForEach40" "ParallelForEach"; do ./ParallelForVsChannel $mode; done
[Channel] = 216ms
  ⮑  7 threads at start
  ⮑  14 threads at end
[Parallel.For @ 4] = 1026ms
  ⮑  7 threads at start
  ⮑  13 threads at end
[Parallel.ForEachAsync @ 4] = 1026ms
  ⮑  7 threads at start
  ⮑  14 threads at end
[Parallel.ForEachAsync @ 40] = 161ms
  ⮑  7 threads at start
  ⮑  15 threads at end
[Parallel.ForEachAsync (Default)] = 1026ms
  ⮑  7 threads at start
  ⮑  14 threads at end

vs regular old release build:

linux-x64$ for mode in "Channels" "ParallelFor4" "ParallelForEach4" "ParallelForEach40" "ParallelForEach"; do ./ParallelForVsChannel $mode; done
[Channel] = 139ms
  ⮑  7 threads at start
  ⮑  14 threads at end
[Parallel.For @ 4] = 964ms
  ⮑  7 threads at start
  ⮑  13 threads at end
[Parallel.ForEachAsync @ 4] = 1021ms
  ⮑  7 threads at start
  ⮑  15 threads at end
[Parallel.ForEachAsync @ 40] = 162ms
  ⮑  7 threads at start
  ⮑  15 threads at end
[Parallel.ForEachAsync (Default)] = 1022ms
  ⮑  7 threads at start
  ⮑  15 threads at end

Ran the publish build just for fun on a

Debian 12 KVM (Host is E3-1230v3) with 8 cores:
[Channel] = 205ms
  ⮑  7 threads at start
  ⮑  18 threads at end
[Parallel.For @ 4] = 1028ms
  ⮑  7 threads at start
  ⮑  15 threads at end
[Parallel.ForEachAsync @ 4] = 1044ms
  ⮑  7 threads at start
  ⮑  15 threads at end
[Parallel.ForEachAsync @ 40] = 165ms
  ⮑  7 threads at start
  ⮑  18 threads at end
[Parallel.ForEachAsync (Default)] = 537ms
  ⮑  7 threads at start
  ⮑  18 threads at end

@CharlieDigital
Copy link
Author

CharlieDigital commented Oct 5, 2023

With AOT on .NET 8 Preview (.NET 7 does not support it on macOS).

Release + AOT + Single File + Trimmed:

dotnet publish -c Release --self-contained -p:PublishNativeAot=true -p:PublishSingleFile=true -p:PublishTrimmed=true

[Channel] = 114ms
[Parallel.For @ 4] = 759ms
[Parallel.ForEachAsync @ 4] = 736ms
[Parallel.ForEachAsync @ 40] = 98ms
[Parallel.ForEachAsync (Default)] = 375ms

Release only + Single File:

dotnet publish -c Release --self-contained -p:PublishSingleFile=true

[Channel] = 67ms
[Parallel.For @ 4] = 841ms
[Parallel.ForEachAsync @ 4] = 780ms
[Parallel.ForEachAsync @ 40] = 103ms
[Parallel.ForEachAsync (Default)] = 394ms

Release + AOT + Single File:

dotnet publish -c Release --self-contained -p:PublishNativeAot=true -p:PublishSingleFile=true

[Channel] = 71ms
[Parallel.For @ 4] = 857ms
[Parallel.ForEachAsync @ 4] = 811ms
[Parallel.ForEachAsync @ 40] = 104ms
[Parallel.ForEachAsync (Default)] = 404ms

Release + AOT + Trimmed:

dotnet publish -c Release --self-contained -p:PublishNativeAot=true -p:PublishTrimmed=true 

[Channel] = 115ms
[Parallel.For @ 4] = 828ms
[Parallel.ForEachAsync @ 4] = 797ms
[Parallel.ForEachAsync @ 40] = 98ms
[Parallel.ForEachAsync (Default)] = 409ms

It seems that the trimming somehow negatively affects the Channel (rather than the Parallel options being optimized).

@AliveDevil
Copy link

That is interesting. I did verify your findings, with

dotnet publish -c Release -r linux-x64 --self-contained -p:PublishNativeAot=true /p:PublishSingleFile=true
[Channel] = 114ms
  ⮑  7 threads at start
  ⮑  14 threads at end
[Parallel.For @ 4] = 966ms
  ⮑  7 threads at start
  ⮑  13 threads at end
[Parallel.ForEachAsync @ 4] = 1034ms
  ⮑  7 threads at start
  ⮑  14 threads at end
[Parallel.ForEachAsync @ 40] = 164ms
  ⮑  7 threads at start
  ⮑  14 threads at end
[Parallel.ForEachAsync (Default)] = 1022ms
  ⮑  7 threads at start
  ⮑  14 threads at end

@CharlieDigital
Copy link
Author

@AliveDevil it's the initialization of the Channel instance. Once it's taken out of the instrumented scope, the numbers match up again.

(I don't know the underlying reason why the instantiation of the Channel would be different in this case).

@nickgrishaev
Copy link

Processor Intel Core i5-13600K (14 Cores, 20 Logical processors), OS Windows 11, .Net 7.0.13
The number of threads does not match the example at all

[Channel] = 78ms
8 threads at start
30 threads at end
[Parallel.For @ 4] = 974ms
30 threads at start
30 threads at end
[Parallel.ForEachAsync @ 4] = 919ms
30 threads at start
30 threads at end
[Parallel.ForEachAsync @ 40] = 113ms
30 threads at start
31 threads at end
[Parallel.ForEachAsync (Default)] = 210ms
31 threads at start
31 threads at end

@CharlieDigital
Copy link
Author

@nickgrishaev I wouldn't expect it to since the thread allocation has the # of cores as a factor and I'm on a Mac M1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment