bbkr/README.txt

## README.txt
I love Perl 6 asynchronous features. They are so easy to use and can give instant boost by changing few lines of code that I got addicted to them. I became asynchronous junkie. And finally overdosed. Here is my story...

I was processing a document that was divided into chapters, sub-chapters, sub-sub-chapters and so on. Parsed to data structure it looked like this:

my %book = (
    '1' => {
        '1.1' => 'Lorem ipsum',
        '1.2' => {
            '1.2.1' => 'Lorem ipsum',
            '1.2.2' => 'Lorem ipsum'
        }
    },
    '2' => {
        '2.1' => {
            '2.1.1' => 'Lorem ipsum'
        }
    }
);

Every chapter required processing of its children before it could be processed.
Also processing of each chapter was quite time consuming - no matter which level it was and how many children did it have. So I started by writing recursive function to do it:

sub process (%chapters) {
    for %chapters.kv -> $number, $content {
        note "Chapter $number started";
        &?ROUTINE.($content) if $content ~~ Hash;
        sleep 1; # here the chapter itself is processed
        note "Chapter $number finished";
    }
}
process(%book);

So nothing fancy here. Maybe except current &?ROUTINE variable which makes recursive code less error prone - there is no need to repeat subroutine name explicitly. After running it I got expected DFS (Depth First Search) flow:

$ time perl6 small_sync.pl
Chapter 1 started
Chapter 1.1 started
Chapter 1.1 finished
Chapter 1.2 started
Chapter 1.2.1 started
Chapter 1.2.1 finished
Chapter 1.2.2 started
Chapter 1.2.2 finished
Chapter 1.2 finished
Chapter 1 finished
Chapter 2 started
Chapter 2.1 started
Chapter 2.1.1 started
Chapter 2.1.1 finished
Chapter 2.1 finished
Chapter 2 finished

real    0m8.184s
user    0m0.131s
sys     0m0.019s

It worked perfectly, but that was too slow. Because 1 second was required to process each chapter it ran in 8 seconds total. So without hesitation I reached for Perl 6 asynchronous goodies to process chapters in parallel instead of linear manner.

sub process (%chapters) {
    await do for %chapters.kv -> $number, $content {
        start {
            note "Chapter $number started";
            &?ROUTINE.outer.($content) if $content ~~ Hash;
            sleep 1; # here the chapter itself is processed
            note "Chapter $number finished";
        }
    }
}
process(%book);

Now every chapter is processed in asynchronous manner but first waits for its children to be also processed asynchronously. Note that after wrapping processing in await/start construct &?ROUTINE must now point to outer scope.

$ time perl6 small_async.pl
Chapter 1 started
Chapter 2 started
Chapter 1.1 started
Chapter 1.2 started
Chapter 2.1 started
Chapter 1.2.1 started
Chapter 2.1.1 started
Chapter 1.2.2 started
Chapter 1.1 finished
Chapter 1.2.1 finished
Chapter 1.2.2 finished
Chapter 2.1.1 finished
Chapter 2.1 finished
Chapter 1.2 finished
Chapter 1 finished
Chapter 2 finished

real    0m3.171s
user    0m0.137s
sys     0m0.020s

Perfect. Time dropped to expected 3 seconds - it was not possible to go any faster because document had 3 nesting levels and each required 1 second to process. Still smiling I threw bigger document at my beautiful script - 10 chapters, each with 10 sub-chapters, each with 10 sub-sub-chapters. It started processing, run for a while... and DEADLOCKED.

Friedrich Nietzsche said that "when you gaze long into an abyss the abyss also gazes into you". Same rule applies to code. After few minutes me and my code were staring at each other. And I couldn't find why it worked perfectly for small documents but was deadlocking in random moments for big ones. Half an hour later I blinked and got defeated by my own code in staring contest. So it was time for debugging.

I noticed that when it was deadlocking there was always constant amount of 16 chapters that were still in progress. And that number looked familiar - thread pool!

$ perl6 -e 'say start { }'
Promise.new(
	scheduler => ThreadPoolScheduler.new(
		initial_threads => 0,
		max_threads => 16,
		uncaught_handler => Callable
	),
	status => PromiseStatus::Kept
)

Every asynchronous task that is planned needs free thread to be executed. And on my system only 16 concurrent threads are allowed. To analyze what happened let's use document from first example but also assume thread pool is limited to 4. So:

$ perl6 script.pl		# 4 threads available by default
Chapter 1 started 		# 3 threads available
Chapter 1.1 started 	# 2 threads available
Chapter 2 started 		# 1 thread available
Chapter 1.1 finished 	# 2 threads available again
Chapter 1.2 started 	# 1 thread available
Chapter 1.2.1 started 	# 0 threads available
						# deadlock!

Now chapter 1 subtree holds three threads and waits for one more for chapter 1.2.2 to complete everything and start ascending from recursion. And subtree of chapter 2 holds one thread and waits for one more for chapter 2.1 to descend into recursion. In result processing gets to a point where at least one thread is required to proceed but all threads are taken and none can be returned to thread pool. Script deadlocks and stops here forever.

How to solve this problem and maintain parallel processing? There are many ways to do it :)
The key to the solution is to process asynchronously only those chapters that do not have unprocessed chapters on lower level.

Slightly rebuilding DFS algorithm fixes the problem in suboptimal way:

sub process (%chapters) {
    return do for %chapters.kv -> $number, $content {
        await &?ROUTINE.outer.($content) if $content ~~ Hash;
        start {
            note "Chapter $number started";
            sleep 1; # here the chapter itself is processed
            note "Chapter $number finished";
        }
    }
}
await process(%book);

Asynchronous job for chapter is now started only when all chapters below are processed:

$ time perl6 small_async_fixed.pl
Chapter 1.1 started
Chapter 1.2.1 started
Chapter 1.2.2 started
-
Chapter 1.1 finished
Chapter 1.2.1 finished
Chapter 1.2.2 finished
Chapter 1.2 started
-
Chapter 1.2 finished
Chapter 1 started
Chapter 2.1.1 started
-
Chapter 1 finished
Chapter 2.1.1 finished
Chapter 2.1 started
-
Chapter 2.1 finished
Chapter 2 started
-
Chapter 2 finished

real    0m5.379s
user    0m1.202s
sys     0m0.164s

Why is it suboptimal? I've marked each second passed for easier reading. There are 4 chapters that could be processed in parallel in first second - 1.1, 1.2.1, 1.2.2 and 2.1.1 - they do not have any chapters below them. But only 3 are processed because script needs to get back from recursion of chapter 1 to begin chapter 2. Such solution will speed processing up only if each chapter has a lot of chapters directly below it.

Switching to BFS (Breadth First Search) algorithm and executing parallel jobs in the reverse order chapters are discovered will also be suboptimal. In such case chapters 1.2.1, 1.2.2 and 2.1.1 go first but processing second nesting level cannot be started before they are finished.

## big_deadlock.pl
use v6;

sub process (%chapters) {
    return await do for %chapters.kv -> $number, $content {
        start {
            note "Chapter $number started";
            &?ROUTINE.outer.($content) if $content ~~ Hash;
            sleep 1; # here the chapter itself is processed
            note "Chapter $number finished";
        }
    }
}

my %book;
for 1..10 -> $i {
    for 1..10 -> $j {
        for 1..10 -> $k {
            %book{$i}{$i~'.'~$j}{$i~'.'~$j~'.'~$k} = 'Lorem ipsum';
        }
    }
}

process(%book);

## small_async.pl
use v6;

sub process (%chapters) {
    await do for %chapters.kv -> $number, $content {
        start {
            note "Chapter $number started";
            &?ROUTINE.outer.($content) if $content ~~ Hash;
            sleep 1; # here the chapter itself is processed
            note "Chapter $number finished";
        }
    }
}

my %book = (
    '1' => {
        '1.1' => 'Lorem ipsum',
        '1.2' => {
            '1.2.1' => 'Lorem ipsum',
            '1.2.2' => 'Lorem ipsum'
        }
    },
    '2' => {
        '2.1' => {
            '2.1.1' => 'Lorem ipsum'
        }
    }
);

process(%book);

## small_sync.pl
use v6;

sub process (%chapters) {
    for %chapters.kv -> $number, $content {
        note "Chapter $number started";
        &?ROUTINE.($content) if $content ~~ Hash;
        sleep 1; # here the chapter itself is processed
        note "Chapter $number finished";
    }
}

my %book = (
    '1' => {
        '1.1' => 'Lorem ipsum',
        '1.2' => {
            '1.2.1' => 'Lorem ipsum',
            '1.2.2' => 'Lorem ipsum'
        }
    },
    '2' => {
        '2.1' => {
            '2.1.1' => 'Lorem ipsum'
        }
    }
);

process(%book);
	I love Perl 6 asynchronous features. They are so easy to use and can give instant boost by changing few lines of code that I got addicted to them. I became asynchronous junkie. And finally overdosed. Here is my story...

	I was processing a document that was divided into chapters, sub-chapters, sub-sub-chapters and so on. Parsed to data structure it looked like this:

	my %book = (
	'1' => {
	'1.1' => 'Lorem ipsum',
	'1.2' => {
	'1.2.1' => 'Lorem ipsum',
	'1.2.2' => 'Lorem ipsum'
	}
	},
	'2' => {
	'2.1' => {
	'2.1.1' => 'Lorem ipsum'
	}
	}
	);

	Every chapter required processing of its children before it could be processed.
	Also processing of each chapter was quite time consuming - no matter which level it was and how many children did it have. So I started by writing recursive function to do it:

	sub process (%chapters) {
	for %chapters.kv -> $number, $content {
	note "Chapter $number started";
	&?ROUTINE.($content) if $content ~~ Hash;
	sleep 1; # here the chapter itself is processed
	note "Chapter $number finished";
	}
	}
	process(%book);

	So nothing fancy here. Maybe except current &?ROUTINE variable which makes recursive code less error prone - there is no need to repeat subroutine name explicitly. After running it I got expected DFS (Depth First Search) flow:

	$ time perl6 small_sync.pl
	Chapter 1 started
	Chapter 1.1 started
	Chapter 1.1 finished
	Chapter 1.2 started
	Chapter 1.2.1 started
	Chapter 1.2.1 finished
	Chapter 1.2.2 started
	Chapter 1.2.2 finished
	Chapter 1.2 finished
	Chapter 1 finished
	Chapter 2 started
	Chapter 2.1 started
	Chapter 2.1.1 started
	Chapter 2.1.1 finished
	Chapter 2.1 finished
	Chapter 2 finished

	real 0m8.184s
	user 0m0.131s
	sys 0m0.019s

	It worked perfectly, but that was too slow. Because 1 second was required to process each chapter it ran in 8 seconds total. So without hesitation I reached for Perl 6 asynchronous goodies to process chapters in parallel instead of linear manner.

	sub process (%chapters) {
	await do for %chapters.kv -> $number, $content {
	start {
	note "Chapter $number started";
	&?ROUTINE.outer.($content) if $content ~~ Hash;
	sleep 1; # here the chapter itself is processed
	note "Chapter $number finished";
	}
	}
	}
	process(%book);

	Now every chapter is processed in asynchronous manner but first waits for its children to be also processed asynchronously. Note that after wrapping processing in await/start construct &?ROUTINE must now point to outer scope.

	$ time perl6 small_async.pl
	Chapter 1 started
	Chapter 2 started
	Chapter 1.1 started
	Chapter 1.2 started
	Chapter 2.1 started
	Chapter 1.2.1 started
	Chapter 2.1.1 started
	Chapter 1.2.2 started
	Chapter 1.1 finished
	Chapter 1.2.1 finished
	Chapter 1.2.2 finished
	Chapter 2.1.1 finished
	Chapter 2.1 finished
	Chapter 1.2 finished
	Chapter 1 finished
	Chapter 2 finished

	real 0m3.171s
	user 0m0.137s
	sys 0m0.020s

	Perfect. Time dropped to expected 3 seconds - it was not possible to go any faster because document had 3 nesting levels and each required 1 second to process. Still smiling I threw bigger document at my beautiful script - 10 chapters, each with 10 sub-chapters, each with 10 sub-sub-chapters. It started processing, run for a while... and DEADLOCKED.

	Friedrich Nietzsche said that "when you gaze long into an abyss the abyss also gazes into you". Same rule applies to code. After few minutes me and my code were staring at each other. And I couldn't find why it worked perfectly for small documents but was deadlocking in random moments for big ones. Half an hour later I blinked and got defeated by my own code in staring contest. So it was time for debugging.

	I noticed that when it was deadlocking there was always constant amount of 16 chapters that were still in progress. And that number looked familiar - thread pool!

	$ perl6 -e 'say start { }'
	Promise.new(
	scheduler => ThreadPoolScheduler.new(
	initial_threads => 0,
	max_threads => 16,
	uncaught_handler => Callable
	),
	status => PromiseStatus::Kept
	)

	Every asynchronous task that is planned needs free thread to be executed. And on my system only 16 concurrent threads are allowed. To analyze what happened let's use document from first example but also assume thread pool is limited to 4. So:

	$ perl6 script.pl # 4 threads available by default
	Chapter 1 started # 3 threads available
	Chapter 1.1 started # 2 threads available
	Chapter 2 started # 1 thread available
	Chapter 1.1 finished # 2 threads available again
	Chapter 1.2 started # 1 thread available
	Chapter 1.2.1 started # 0 threads available
	# deadlock!

	Now chapter 1 subtree holds three threads and waits for one more for chapter 1.2.2 to complete everything and start ascending from recursion. And subtree of chapter 2 holds one thread and waits for one more for chapter 2.1 to descend into recursion. In result processing gets to a point where at least one thread is required to proceed but all threads are taken and none can be returned to thread pool. Script deadlocks and stops here forever.

	How to solve this problem and maintain parallel processing? There are many ways to do it :)
	The key to the solution is to process asynchronously only those chapters that do not have unprocessed chapters on lower level.

	Slightly rebuilding DFS algorithm fixes the problem in suboptimal way:

	sub process (%chapters) {
	return do for %chapters.kv -> $number, $content {
	await &?ROUTINE.outer.($content) if $content ~~ Hash;
	start {
	note "Chapter $number started";
	sleep 1; # here the chapter itself is processed
	note "Chapter $number finished";
	}
	}
	}
	await process(%book);

	Asynchronous job for chapter is now started only when all chapters below are processed:

	$ time perl6 small_async_fixed.pl
	Chapter 1.1 started
	Chapter 1.2.1 started
	Chapter 1.2.2 started
	-
	Chapter 1.1 finished
	Chapter 1.2.1 finished
	Chapter 1.2.2 finished
	Chapter 1.2 started
	-
	Chapter 1.2 finished
	Chapter 1 started
	Chapter 2.1.1 started
	-
	Chapter 1 finished
	Chapter 2.1.1 finished
	Chapter 2.1 started
	-
	Chapter 2.1 finished
	Chapter 2 started
	-
	Chapter 2 finished

	real 0m5.379s
	user 0m1.202s
	sys 0m0.164s

	Why is it suboptimal? I've marked each second passed for easier reading. There are 4 chapters that could be processed in parallel in first second - 1.1, 1.2.1, 1.2.2 and 2.1.1 - they do not have any chapters below them. But only 3 are processed because script needs to get back from recursion of chapter 1 to begin chapter 2. Such solution will speed processing up only if each chapter has a lot of chapters directly below it.

	Switching to BFS (Breadth First Search) algorithm and executing parallel jobs in the reverse order chapters are discovered will also be suboptimal. In such case chapters 1.2.1, 1.2.2 and 2.1.1 go first but processing second nesting level cannot be started before they are finished.
	use v6;

	sub process (%chapters) {
	return await do for %chapters.kv -> $number, $content {
	start {
	note "Chapter $number started";
	&?ROUTINE.outer.($content) if $content ~~ Hash;
	sleep 1; # here the chapter itself is processed
	note "Chapter $number finished";
	}
	}
	}

	my %book;
	for 1..10 -> $i {
	for 1..10 -> $j {
	for 1..10 -> $k {
	%book{$i}{$i~'.'~$j}{$i~'.'~$j~'.'~$k} = 'Lorem ipsum';
	}
	}
	}

	process(%book);