Just make Erlang to do some light-load task in a bunch of relatively short leaving processes. Calculating PI to 80th digit in batches per 10 procs with short, 8 ms, sleep in-between, to give schedulers a breath space, will do.
Run erl with 8 schedulers and no busy waiting on schedulers at all.
erl +sbwt none +S 8:8
1> c(machin).
{ok,machin}
2> machin:run().
At the same time in another terminal grab erlang's CPU utilization with 1 sec window. In my case whole thing running for about 12 sec, so I take average by that.
$ pidstat -p `pidof beam.smp` -u 1 12
Linux 3.16.0-4-amd64 (jessie) 08/27/2015 _x86_64_ (2 CPU)
09:15:29 AM UID PID %usr %system %guest %CPU CPU Command
09:15:30 AM 900 24803 4.00 1.00 0.00 5.00 1 beam.smp
09:15:31 AM 900 24803 26.00 4.00 0.00 30.00 1 beam.smp
09:15:32 AM 900 24803 24.00 4.00 0.00 28.00 1 beam.smp
09:15:33 AM 900 24803 17.00 6.00 0.00 23.00 1 beam.smp
09:15:34 AM 900 24803 21.00 4.00 0.00 25.00 1 beam.smp
09:15:35 AM 900 24803 19.00 2.00 0.00 21.00 1 beam.smp
09:15:36 AM 900 24803 20.00 3.00 0.00 23.00 1 beam.smp
09:15:37 AM 900 24803 21.00 3.00 0.00 24.00 1 beam.smp
09:15:38 AM 900 24803 14.00 10.00 0.00 24.00 1 beam.smp
09:15:39 AM 900 24803 16.00 9.00 0.00 25.00 1 beam.smp
09:15:40 AM 900 24803 1.00 0.00 0.00 1.00 1 beam.smp
09:15:41 AM 900 24803 0.00 1.00 0.00 1.00 1 beam.smp
Average: 900 24803 15.25 3.92 0.00 19.17 - beam.smp
Now do the same, but this time make scheduler's busy wait really significant.
erl +sbwt very_long +S 8:8
Erlang/OTP 17 [erts-6.4.1.2] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V6.4.1.2 (abort with ^G)
1> c(machin).
{ok,machin}
2> machin:run().
$ pidstat -p `pidof beam.smp` -u 1 12
Linux 3.16.0-4-amd64 (jessie) 08/27/2015 _x86_64_ (2 CPU)
09:18:37 AM UID PID %usr %system %guest %CPU CPU Command
09:18:38 AM 900 24845 5.00 4.00 0.00 9.00 1 beam.smp
09:18:39 AM 900 24845 15.00 57.00 0.00 72.00 1 beam.smp
09:18:40 AM 900 24845 13.00 62.00 0.00 75.00 1 beam.smp
09:18:41 AM 900 24845 19.00 57.00 0.00 76.00 1 beam.smp
09:18:42 AM 900 24845 16.00 54.00 0.00 70.00 1 beam.smp
09:18:43 AM 900 24845 16.00 54.00 0.00 70.00 1 beam.smp
09:18:44 AM 900 24845 14.00 56.00 0.00 70.00 1 beam.smp
09:18:45 AM 900 24845 15.00 56.00 0.00 71.00 1 beam.smp
09:18:46 AM 900 24845 20.00 55.00 0.00 75.00 1 beam.smp
09:18:47 AM 900 24845 11.00 61.00 0.00 72.00 1 beam.smp
09:18:48 AM 900 24845 0.99 1.98 0.00 2.97 1 beam.smp
09:18:49 AM 900 24845 0.00 0.00 0.00 0.00 1 beam.smp
Average: 900 24845 12.07 43.13 0.00 55.20 - beam.smp
So in the first case we had an average 15.25% of user CPU utilization (this is the work done) with 3.92% of system CPU utilization (this is schedulers' waiting).
In the second case we've got average 12.07% of user CPU with 43.13% of system CPU.
High system CPU in the second case is pure Erlang's schedulers' 'busy wait' manifistation. We've gained a bit of performance on our main task thought, so it might be worth to have it on CPU bound systems. If there are nothing apart from Erlang app on the host, that's it, the schedulers' spinning is not going to hurt erlang's processes, but can push back another system processes.
This is the real consequence of erlang's 'busy wait'. Another one, naturally, empty heat and electricity waste. And the fact that CPU utilization is not a good monitoring metric anymore, using load average instead might be better idea.