Skip to content

Instantly share code, notes, and snippets.

@fabriciofx
Forked from rlcamp/coroutine.c
Created May 27, 2020 00:43
Show Gist options
  • Save fabriciofx/b5c4fabf640a72ba7629c132d6424223 to your computer and use it in GitHub Desktop.
Save fabriciofx/b5c4fabf640a72ba7629c132d6424223 to your computer and use it in GitHub Desktop.
Coroutines for generator functions, sequential pipelines, state machines, and other uses in C
/*
Copyright 2015-2020 Richard Campbell
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Coroutines for generator functions, sequential pipelines, state machines, and other uses in C
Significant conceptual contribution by Rich Felker and others
Implementation by Richard Campbell
Tested on cortex-m bare metal, avr bare metal, cortex-a7 linux, aarch64 iOS, aarch64 linux, x86_64 darwin, x86_64 linux actual hardware, others (i386, m68k, riscv, ppc64) via qemu
The intended use case is where concurrency, but not parallelism, is required, with a possibly very high rate of switching between coroutines, in a possibly hard-realtime context (such as an audio subsystem).
This implementation relies only on a couple platform-specific inline assembly macros, and is therefore useful in uncooperative environments such as iOS or bare metal. The remainder of the implementation is platform-agnostic C.
Unlike protothreads and other stackless methods, each coroutine has its own fully functional call stack, so local variables in the coroutines preserve their values, multiple instances of the same function may be in flight at once, and coroutines may yield from anywhere within the call stack, not just the top level function.
Unlike pthreads and other kernel-scheduled preemptive multitasking thread implementations, it is extremely cheap to switch between coroutines, and hard-realtime requirements may be met while doing so. Thus, it is possible to have a generator-type coroutine that yields individual audio samples (or small batches of them) to a hard-realtime callback function inside an audio subsystem. It would be possible to implement this API using pthreads, but the cost of switching between coroutines would be many orders of magnitude greater, and it would not be possible to meet hard-realtime deadlines.
This code consists of the coroutine starting and switching primitives, as well as some convenience functions (yield, from) for implementing generator functions, sequential pipelines, and things of that sort. These generator functions may yield a pointer to local storage, a pointer to heap storage (which may or may not be expected to be freed by the yielded-to code), or any nonzero value that can fit within the memory occupied by a void pointer. A coroutine may NOT yield NULL, as this has special meaning to the accepting code (it indicates to a waiting parent that a child function has already returned, or indicates to a waiting child that a parent function wants it to clean up and return). If a coroutine yields a pointer to a local variable, the accepting code may only read through that pointer, not write through it (unless the local variable is marked volatile). See the documentation of setjmp/longjmp for more details, and assume all the same restrictions apply, even though this implementation does not use setjmp/longjmp.
*/
#include "coroutine.h"
#include <stddef.h>
#include <stdint.h>
#if defined(__x86_64__)
#define CONTEXT_BUF_SIZE_BYTES 24
#define STACK_ALIGNMENT 64
#define BOOTSTRAP_CONTEXT(buf, func) do { \
register void * _buf asm("rdx") = buf; /* force the argument into this register. this is necessary so that it doesn't choose rbp even though it's not in the clobber list */ \
register void * _func asm("rsi") = func; /* force the argument into this register */ \
asm volatile( \
"lea 1f(%%rip), %%r8\n" /* compute address of end of this block of asm, which will be jumped to when returning to this context */ \
"movq %%r8, 0(%0)\n" /* store it to the beginning of the buffer */ \
"movq %%rsp, 8(%0)\n" /* store the stack pointer */ \
"movq %%rbp, 16(%0)\n" /* store the frame pointer */ \
"movq %0, %%rsp\n" /* switch to the new stack pointer (the new stack is immediately below the context buffer) */ \
"movq %0, %%rdi\n" /* copy the stack pointer, which is also the argument, to the first argument register */ \
"call *%1\n" /* call the springboard fn (never returns, but using call instead of jmp properly aligns the stack) */ \
"1:\n" /* marker for address of end of this block, so we can jump to here */ \
: "+r"(_buf), "+r"(_func) : : "rdi", "rcx", "rax", "rbx", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "mm0", "mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm7", "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7", "xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15", "cc", "fpsr", "flags", "memory"); } while(0)
#define SWAP_CONTEXT(buf) do { \
register void * _buf asm("rdi") = buf; /* force the argument into this register */ \
asm volatile( \
"movq 0(%0), %%rsi\n" /* load the saved instruction pointer */ \
"movq 8(%0), %%rdx\n" /* load the saved stack pointer */ \
"movq 16(%0), %%rcx\n" /* load the saved frame pointer */ \
"lea 1f(%%rip), %%r8\n" /* compute address of end of this block of asm, which will be jumped to when returning to this context */ \
"movq %%r8, 0(%0)\n" /* store it to the beginning of the buffer */ \
"movq %%rsp, 8(%0)\n" /* store the stack pointer */ \
"movq %%rbp, 16(%0)\n" /* store the frame pointer */ \
"movq %%rdx, %%rsp\n" /* switch to the loaded stack pointer */ \
"movq %%rcx, %%rbp\n" /* switch to the loaded frame pointer */ \
"jmp *%%rsi\n" /* jump to the loaded address (the end of this block of asm, but in the buf context) */ \
"1:\n" /* marker for address of end of this block, so we can jump to here */ \
: "+r"(_buf) : : "rsi", "rdx", "rcx", "rax", "rbx", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "mm0", "mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm7", "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7", "xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15", "cc", "fpsr", "flags", "memory"); } while(0)
#elif defined(__aarch64__)
#define CONTEXT_BUF_SIZE_BYTES 32
#define STACK_ALIGNMENT 64
#define BOOTSTRAP_CONTEXT(buf, func) do { \
register void * _buf asm("x0") = buf; \
register void * _func asm("x1") = func; \
asm volatile( \
"adr x4, 0f\n" \
"mov x5, sp\n" \
"stp x4, x5, [%0]\n" \
"stp x29, x18, [%0, #16]\n" \
"mov x4, %1\n" /* load new pc */ \
"mov x5, %0\n" /* load new sp */ \
"mov sp, x5\n" \
"br x4\n" \
"0:\n" \
: "+r"(_buf), "+r"(_func) : : "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10", "x11", "x12", "x13", "x14", "x15", "x16", "x17", "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26", "x27", "x28", "x30", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", "cc", "memory"); } while (0)
#define SWAP_CONTEXT(buf) do { \
register void * _buf asm("x0") = buf; \
asm volatile( \
"ldp x6, x7, [%0]\n" \
"ldp x8, x9, [%0, #16]\n" \
"adr x4, 0f\n" \
"mov x5, sp\n" \
"stp x4, x5, [%0]\n" \
"stp x29, x18, [%0, #16]\n" \
"mov x4, x6\n" \
"mov x5, x7\n" \
"mov x29, x8\n" \
"mov x18, x9\n" \
"mov sp, x5\n" \
"br x4\n" \
"0:\n" \
: "+r"(_buf) : : "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10", "x11", "x12", "x13", "x14", "x15", "x16", "x17", "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26", "x27", "x28", "x30", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", "cc", "memory"); } while (0)
#elif defined(__arm__)
#define CONTEXT_BUF_SIZE_BYTES 12
#define STACK_ALIGNMENT 16
#if __thumb__
#define ADD_ONE_TO_R4_IF_THUMB "add r4, 1\n"
#else
#define ADD_ONE_TO_R4_IF_THUMB ""
#endif
#define BOOTSTRAP_CONTEXT(buf, func) do { \
register void * _buf asm("r0") = buf; \
register void * _func asm("r1") = func; \
asm volatile( \
"adr r4, 0f\n" \
ADD_ONE_TO_R4_IF_THUMB \
"mov r5, sp\n" \
"str r4, [%0]\n" \
"str r5, [%0, #4]\n" \
"str r11, [%0, #8]\n" \
"mov sp, %0\n" \
"mov r6, %1\n" \
"bx r6\n" \
"0:\n" \
: "+r"(_buf), "+r"(_func) : : "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r12", "lr", "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15", "cc", "memory"); } while(0)
#define SWAP_CONTEXT(buf) do { \
register void * _buf asm("r0") = buf; \
asm volatile( \
"ldr r6, [%0]\n" \
"ldr r7, [%0, #4]\n" \
"ldr r8, [%0, #8]\n" \
"adr r4, 0f\n" \
ADD_ONE_TO_R4_IF_THUMB \
"mov r5, sp\n" \
"str r4, [%0]\n" \
"str r5, [%0, #4]\n" \
"str r11, [%0, #8]\n" \
"mov r11, r8\n" \
"mov sp, r7\n" \
"bx r6\n" \
"0:\n" \
: "+r"(_buf) : : "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r12", "lr", "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15", "cc", "memory"); } while(0)
#elif defined(__i386__)
#define CONTEXT_BUF_SIZE_BYTES 12
#define STACK_ALIGNMENT 16
#define CLOBBER_BASE "edi", "ecx", "eax", "ebx", "cc", "fpsr", "flags", "memory"
#define CLOBBER_MMX "mm0", "mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm7"
#define CLOBBER_SSE "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7"
#if defined(__SSE__)
#define CLOBBER CLOBBER_BASE, CLOBBER_MMX, CLOBBER_SSE
#elif defined(__MMX__)
#define CLOBBER CLOBBER_BASE, CLOBBER_MMX
#else
#define CLOBBER CLOBBER_BASE
#endif
#define BOOTSTRAP_CONTEXT(buf, func) do { \
register void * _buf asm("edx") = buf; \
register void * _func asm("esi") = func; \
asm volatile( \
"call 0f\n" /* in i386 and other architectures, we need to do some trickery to load the address of a label */ \
"0: pop %%eax\n" \
"add $(1f-0b), %%eax\n" \
"mov %%eax, 0(%0)\n" \
"mov %%esp, 4(%0)\n" \
"mov %%ebp, 8(%0)\n" \
"mov %0, %%esp\n" \
"mov %0, %%edi\n" \
"sub $12, %%esp\n" /* decrement sp by 12 before pushing, to enforce 16 byte alignment */ \
"push %0\n" /* argument is now both on stack and in edi, so either call convention will succeed */ \
"call *%1\n" \
"1:\n" \
: "+r"(_buf), "+r"(_func) : : CLOBBER); } while(0)
#define SWAP_CONTEXT(buf) do { \
register void * _buf asm("edx") = buf; \
asm volatile( \
"mov 0(%0), %%esi\n" \
"mov 4(%0), %%edi\n" \
"mov 8(%0), %%ecx\n" \
"call 0f\n" /* in i386 and other architectures, we need to do some trickery to load the address of a label */ \
"0: pop %%eax\n" \
"add $(1f-0b), %%eax\n" \
"mov %%eax, 0(%0)\n" \
"mov %%esp, 4(%0)\n" \
"mov %%ebp, 8(%0)\n" \
"mov %%edi, %%esp\n" \
"mov %%ecx, %%ebp\n" \
"jmp *%%esi\n" \
"1:\n" \
: "+r"(_buf) : : "esi", CLOBBER); } while(0)
#elif defined(__riscv)
#define CONTEXT_BUF_SIZE_BYTES 32
#define STACK_ALIGNMENT 64
#define BOOTSTRAP_CONTEXT(buf, func) do { \
register void * _buf asm("x10") = buf; \
register void * _func asm("x11") = func; \
asm volatile( \
"la x6, 1f\n" \
"sd x6, 0(%0)\n" \
"sd x2, 8(%0)\n" \
"sd x8, 16(%0)\n" \
"sd x1, 24(%0)\n" \
"mv x2, %0\n" \
"jr %1\n" \
"1:\n" \
: "+r"(_buf), "+r"(_func) : : "x0", "x1", "x3", "x4", "x5", "x6", "x7", "x9", "x12", "x13", "x14", "x15", "x16", "x17", "x18", "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26", "x27", "x28", "x29", "x30", "x31", "f0", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15", "f16", "f17", "f18", "f19", "f20", "f21", "f22", "f23", "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", "cc", "memory"); } while (0)
#define SWAP_CONTEXT(buf) do { \
register void * _buf asm("x10") = buf; \
asm volatile( \
"ld x11, 0(%0)\n" \
"ld x12, 8(%0)\n" \
"ld x13, 16(%0)\n" \
"ld x14, 24(%0)\n" \
"la x6, 1f\n" \
"sd x6, 0(%0)\n" \
"sd x2, 8(%0)\n" \
"sd x8, 16(%0)\n" \
"sd x1, 24(%0)\n" \
"mv x2, x12\n" \
"mv x8, x13\n" \
"mv x1, x14\n" \
"jr x11\n" \
"1:\n" \
: "+r"(_buf) : : "x0", "x1", "x3", "x4", "x5", "x6", "x7", "x9", "x11", "x12", "x13", "x14", "x15", "x16", "x17", "x18", "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26", "x27", "x28", "x29", "x30", "x31", "f0", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15", "f16", "f17", "f18", "f19", "f20", "f21", "f22", "f23", "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", "cc", "memory"); } while (0)
#elif defined(__m68k__)
#define CONTEXT_BUF_SIZE_BYTES 16
#define STACK_ALIGNMENT 16
#define BOOTSTRAP_CONTEXT(buf, func) do { \
register void * _buf asm("d0") = buf; \
register void * _func asm("d1") = func; \
asm volatile( \
"lea 1f, %%a0\n" \
"move.l %%a0, 0(%0)\n" \
"move.l %%sp, 4(%0)\n" \
"move.l %%a6, 8(%0)\n" \
"move.l %%a5, 12(%0)\n" \
"move.l %0, %%sp\n" \
"subq #4, %%sp\n" /* enforce stack alignment */ \
"move.l %0, (%%sp)\n" \
"jsr (%1)\n" /* this is a call rather than jmp because of stack alignment requirements */ \
"1:\n" \
: "+r"(_buf), "+r"(_func) : : "d2", "d3", "d4", "d5", "d6", "d7", "a0", "a1", "a2", "a3", "a4", "cc", "memory"); } while (0)
#define SWAP_CONTEXT(buf) do { \
register void * _buf asm("d0") = buf; \
asm volatile( \
"move.l 0(%0), %%a1\n" \
"move.l 4(%0), %%a2\n" \
"move.l 8(%0), %%a3\n" \
"move.l 12(%0), %%a4\n" \
"lea 1f, %%a0\n" \
"move.l %%a0, 0(%0)\n" \
"move.l %%sp, 4(%0)\n" \
"move.l %%a6, 8(%0)\n" \
"move.l %%a5, 12(%0)\n" \
"move.l %%a2, %%sp\n" \
"move.l %%a3, %%a6\n" \
"move.l %%a4, %%a5\n" \
"jmp (%%a1)\n" \
"1:\n" \
: "+r"(_buf) : : "d1", "d2", "d3", "d4", "d5", "d6", "d7", "a0", "a1", "a2", "a3", "a4", "cc", "memory"); } while (0)
#elif defined(__AVR__)
#define CONTEXT_BUF_SIZE_BYTES 8
#define STACK_ALIGNMENT 1
#define BOOTSTRAP_CONTEXT(buf, func) do { \
register void * _buf asm("r30") = buf; /* address of the context struct (placed immediately above the new stack) */ \
register void * _func asm("r22") = func; /* function pointer to the function to run on the new stack */ \
asm volatile( \
"ldi r24, pm_lo8(1f)\n" /* compute address of end of this block of asm, which will be jumped to when returning to this context */ \
"ldi r25, pm_hi8(1f)\n" \
"in r10, __SP_L__\n" /* retrieve the stack pointer */ \
"in r11, __SP_H__\n" \
"in r12, __SREG__\n" /* retrieve sreg */ \
"std z+0, r24\n" /* store jump address to the beginning of the buffer */ \
"std z+1, r25\n" \
"std z+2, r10\n" /* store stack pointer to next slot */ \
"std z+3, r11\n" \
"std z+4, r28\n" /* store frame pointer */ \
"std z+5, r29\n" \
"std z+6, r12\n" /* store sreg */ \
"movw r24, r30\n" /* copy buf address to first argument register pair */ \
"sbiw r30, 1\n" /* subtract 1 from it to get the new stack pointer because avr is postdecrement */ \
"out __SP_L__, r30\n" /* switch to the new stack pointer */ \
"out __SP_H__, r31\n" \
"movw r30, r22\n" /* put the address of the springboard function in the indirect jump address pair */ \
"ijmp\n" /* jump to the springboard */ \
"1:\n" /* marker for address of end of this block, so we can jump back to here */ \
: "+z"(_buf), "+r"(_func) : : "r0", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "r16", "r17", "r18", "r19", "r20", "r21", "r24", "r25", "r26", "r27", "r28", "r29", "cc", "memory"); } while(0)
#define SWAP_CONTEXT(buf) do { \
register void * _buf asm("r30") = buf; /* force the argument into this register */ \
asm volatile( \
"ldd r2, z+0\n" /* load the saved instruction pointer */ \
"ldd r3, z+1\n" \
"ldd r4, z+2\n" /* load the saved stack pointer */ \
"ldd r5, z+3\n" \
"ldd r6, z+4\n" /* load the saved frame pointer */ \
"ldd r7, z+5\n" \
"ldd r8, z+6\n" /* load sreg */ \
"ldi r24, pm_lo8(1f)\n" /* compute address of end of this block of asm, which will be jumped to when returning to this context */ \
"ldi r25, pm_hi8(1f)\n" \
"std z+0, r24\n" /* store it to the beginning of the buffer */ \
"std z+1, r25\n" \
"in r10, __SP_L__\n" /* retrieve the stack pointer */ \
"in r11, __SP_H__\n" \
"std z+2, r10\n" /* and store it */ \
"std z+3, r11\n" \
"std z+4, r28\n" /* store the frame pointer */ \
"std z+5, r29\n" \
"in r12, __SREG__\n" /* retrieve sreg */ \
"std z+6, r12\n" /* store sreg */ \
"out __SP_L__, r4\n" /* switch to the loaded stack pointer */ \
"out __SP_H__, r5\n" \
"out __SREG__, r8\n" /* switch to the loaded sreg */ \
"movw r28, r6\n" /* switch to the loaded frame pointer */ \
"movw r30, r2\n" /* jump to the loaded address (the end of this block of asm, but in the buf context) */ \
"ijmp\n" \
"1:\n" /* marker for address of end of this block, so we can jump to here */ \
: "+z"(_buf) : : "r0", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", "r29", "cc", "memory"); } while(0)
#elif defined(__PPC64__)
#warning this is not working correctly with position independent executables
#define CONTEXT_BUF_SIZE_BYTES 32
#define STACK_ALIGNMENT 64
#define BOOTSTRAP_CONTEXT(buf, func) do { \
register void * _buf asm("r3") = buf; \
register void * _func asm("r4") = func; \
asm volatile( \
"bl 0f\n" /* calculate address of end of this block, so we can jump back to it from the created child */ \
"0: mflr 8\n" \
"addi 8, 8, 1f-0b\n" \
"std 8, 0(%0)\n" /* store it in the first slot of the buffer */ \
"std 1, 8(%0)\n" /* store the parent's stack pointer in the next slot */ \
"std 31, 16(%0)\n" /* store the parent's frame pointer in the next slot */ \
"std 2, 24(%0)\n" /* store the parent's toc pointer in the last slot */ \
"mr 1, %0\n" /* use "buf" as the stack pointer for the created child */ \
"subi 1, 1, 24\n" /* subtract 24 to enforce stack alignment */ \
"mr 3, %0\n" /* put "buf" in the first function call argument */ \
"mtctr %1\n" /* put address of "func" in ctr register */ \
"bctr\n" /* call "func" (this does not return) */ \
"1:\n" \
: "+r"(_buf), "+r"(_func) : : "r0", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", "r29", "r30", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", "memory", "cc", "xer", "ctr", "cr0", "cr1", "cr2", "cr3", "cr4", "cr5", "cr6", "cr7", "lr"); } while (0)
#define SWAP_CONTEXT(buf) do { \
register void * _buf asm("r3") = buf; \
asm volatile( \
"ld 9, 0(%0)\n" /* retrieve inactive thread's pc from buffer */ \
"ld 10, 8(%0)\n" /* retrieve inactive stack pointer */ \
"ld 11, 16(%0)\n" /* retrieve inactive frame pointer */ \
"ld 12, 24(%0)\n" /* retrieve inactive toc pointer */ \
"bl 0f\n" /* calculate address of end of this block, so we can jump back to it on the next context switch */ \
"0: mflr 8\n" \
"addi 8, 8, 1f-0b\n" \
"std 8, 0(%0)\n" /* store the active thread's pc in the buffer */ \
"std 1, 8(%0)\n" /* store the active thread's stack pointer */ \
"std 31, 16(%0)\n" /* store the active thread's frame pointer */ \
"std 2, 24(%0)\n" /* store the active thread's toc pointer */ \
"mr 1, 10\n" /* switch the stack pointer to the loaded value */ \
"mr 31, 11\n" /* switch the frame pointer to the loaded value */ \
"mr 2, 12\n" /* switch the toc pointer to the loaded value */ \
"mtctr 9\n" /* jump to the stored program counter */ \
"bctr\n" \
"1:\n" \
: "+r"(_buf) : : "r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", "r29", "r30", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", "memory", "cc", "xer", "ctr", "cr0", "cr1", "cr2", "cr3", "cr4", "cr5", "cr6", "cr7", "lr"); } while (0)
#endif
/* end architecture specific macros */
/* sentinel value */
static void * const not_filled = &(void *){ NULL };
//static void * const not_filled = (void *)&not_filled;
struct channel {
/* holds the pc, sp, fp, and (on some architectures) a fourth value, for the inactive coroutine */
unsigned char context_inactive[CONTEXT_BUF_SIZE_BYTES];
/* the actual function to run. this pointer is nulled when the function has returned (a condition used within from()) */
void (* volatile func)(struct channel *, void *);
/* argument to pass to the above function, also used to pass values between parent and child via yield(). this is set to the special value not_filled when it is logically empty */
void * value;
/* if a dynamic allocator was used, give from() the info it needs to clean up when the child returns */
void * free_after;
void (* free_func)(void *);
};
const size_t sizeof_struct_channel = sizeof(struct channel);
void coroutine_switch(struct channel * channel) {
/* do the actual context swap */
SWAP_CONTEXT(channel->context_inactive);
}
__attribute((noreturn)) static void springboard(struct channel * channel) {
void * arg = channel->value;
channel->value = not_filled;
/* run the main body of the child coroutine */
channel->func(channel, arg);
/* reached when coroutine function finishes. the parent may be watching for this condition */
channel->func = NULL;
/* leave the child context forever */
coroutine_switch(channel);
__builtin_unreachable();
}
struct channel * coroutine_create_given_memory(void (* func)(struct channel *, void *), void * arg, void * block, const size_t blocksize) {
/* position the top of the new stack within the block, respecting alignment requirements */
struct channel * channel = (void *)((char *)block + ((blocksize - sizeof(struct channel)) & ~(STACK_ALIGNMENT - 1)));
/* copy values into the struct. all variables not listed here are zeroed */
*channel = (struct channel) {
.func = func,
.value = arg
};
/* call the bit of asm that fills the inactive context and jumps to the springboard */
BOOTSTRAP_CONTEXT(channel, springboard);
/* control flow returns here via the first coroutine_switch in the child */
return channel;
}
void yield_to(struct channel * channel, void * pointer) {
channel->value = pointer;
coroutine_switch(channel);
}
void * from(struct channel * channel) {
/* when this is called from the parent, it will free the child if the child has exited */
if (channel->func && not_filled == channel->value)
coroutine_switch(channel);
if (!channel->func) {
if (channel->free_func) channel->free_func(channel->free_after);
return NULL;
}
/* take the value */
void * ret = channel->value;
channel->value = not_filled;
return ret;
}
void close_and_join(struct channel * channel) {
/* for parent-to-child data flow, if child is waiting in from(), give it a NULL, which it should react to by cleaning up and returning */
yield_to(channel, NULL);
if (channel->free_func) channel->free_func(channel->free_after);
}
/* convenience function, not used on bare metal, caveats about calling abort() within library code apply */
#if defined (__unix__) || defined (__APPLE__)
#include <stdlib.h>
struct channel * coroutine_create(void (* func)(struct channel *, void *), void * arg) {
/* allocate space for child stack and struct channel */
unsigned char * block = NULL;
if (posix_memalign((void *)&block, STACK_ALIGNMENT, 524288)) abort();
/* do the rest of the coroutine init that does not have to do with memory allocation */
struct channel * channel = coroutine_create_given_memory(func, arg, block, 524288);
channel->free_after = block;
channel->free_func = free;
return channel;
}
#endif
/* see accompanying coroutine.c for full details */
/* start the given function, with the given argument, and return a channel between it and the calling code */
struct channel * coroutine_create(void (* func)(struct channel * parent, void *), void * arg);
/* a generator-type coroutine calls this to pass things back to its parent */
void yield_to(struct channel * channel, void * pointer);
/* for generators, the parent calls this in a loop. when generator has returned, this returns NULL.
example: for (struct thing * thing; (thing = from(parent)); ) { ... do something with thing ... }
*/
void * from(struct channel * channel);
/* if the parent is passing things to the child, it calls this to signal to the child that no more is coming */
void close_and_join(struct channel * channel);
/* internal functionality exposed so that more advanced things can be done */
void coroutine_switch(struct channel * channel);
/* needed for size_t */
#include <stddef.h>
struct channel * coroutine_create_given_memory(void (* func)(struct channel *, void *), void * arg, void * block, const size_t blocksize);
extern const size_t sizeof_struct_channel;
#include "coroutine.h"
#include <stdio.h>
#include <ctype.h>
static char * morsetable[256] = {
[' '] = " ",
['A'] = " - --- ",
['B'] = " --- - - - ",
['C'] = " --- - --- - ",
['D'] = " --- - - ",
['E'] = " - ",
['F'] = " - - --- - ",
['G'] = " --- --- - ",
['H'] = " - - - - ",
['I'] = " - - ",
['J'] = " --- --- --- - ",
['K'] = " --- - --- ",
['L'] = " - --- - - ",
['M'] = " --- --- ",
['N'] = " --- - ",
['O'] = " --- --- --- ",
['P'] = " - --- --- - ",
['Q'] = " --- --- - --- ",
['R'] = " - --- - ",
['S'] = " - - - ",
['T'] = " --- ",
['U'] = " - - --- ",
['V'] = " - - - --- ",
['W'] = " - --- --- ",
['X'] = " --- - - --- ",
['Y'] = " --- - --- --- ",
['Z'] = " --- --- - - ",
['1'] = " - --- --- --- --- ",
['2'] = " - - --- --- --- ",
['3'] = " - - - --- --- ",
['4'] = " - - - - --- ",
['5'] = " - - - - - ",
['6'] = " --- - - - - ",
['7'] = " --- --- - - - ",
['8'] = " --- --- --- - - ",
['9'] = " --- --- --- --- - ",
['0'] = " --- --- --- --- --- ",
['+'] = " - --- - --- - ",
['-'] = " --- - - - - --- ",
['?'] = " - - --- --- - - ",
['/'] = " --- - - --- - ",
['.'] = " - --- - --- - --- ",
[','] = " --- --- - - --- --- ",
['\''] = " --- - - --- - ",
[')'] = " --- - --- --- - --- ",
['('] = " --- - --- --- - ",
[':'] = " --- --- --- - - - ",
};
void morse_generator(struct channel * parent, void * sentence) {
/* this is a simple demonstration of the benefit of a generator function, for producing samples according to logic that requires internal state. if this were written as a callback, the loop structure would have to be "inside out", with additional opportunities for error, and with the loop control state stored in either static/global memory or in a separate data structure */
/* loop over letters within the sentence */
for (const char * letter = sentence; *letter != '\0'; letter++) {
/* get the pixel array for the uppercase version of the given character, or that of whitespace if not defined */
char * pixels = morsetable[toupper(*letter)] ?: morsetable[' '];
/* loop over pixels within the current letter */
for (char * pixel = pixels; *pixel != '\0'; pixel++)
yield_to(parent, pixel);
}
/* generators implicitly yield null when they return, as seen by a parent blocked in from() */
}
int main(const int argc, char ** const argv) {
/* sentence to transmit will be "test" unless another was provided */
char * sentence = argc > 1 ? argv[1] : "test";
/* start a generator function with the given sentence as the argument */
struct channel * child = coroutine_create(morse_generator, sentence);
/* loop over pixels produced by the generator, until from() returns NULL */
for (const char * pixel; (pixel = from(child)); )
/* write each pixel to stdout */
putchar(*pixel);
/* when we exit the above loop, the coroutine has returned, and all resources associated with it have been freed */
/* write a newline */
putchar('\n');
return 0;
}
/*
very simple examples for coroutine.c
cc -Ofast -o cotests cotests.c coroutine.c
*/
/* just because we want to use asprintf in one of the examples */
#define _GNU_SOURCE
#include "coroutine.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* the base case is generator functions, in which the parent starts the child, and the child repeatedly passes things to the parent. the things can be anything that fits in a void pointer. it is safe for the child to yield pointers to its own local variables - they are guaranteed to still be in scope */
void generator(struct channel * parent, void * context) {
printf("%s: spawned from %s\n", __func__, (char *)context);
for (size_t num = 0; num < 4; num++)
yield_to(parent, &num);
printf("%s: no more output is coming\n", __func__);
}
void consumer(void) {
printf("%s: base case: generator pattern\n", __func__);
struct channel * child = coroutine_create(generator, (void *)__func__);
/* loop until accept_from returns null */
for (size_t * nump; (nump = from(child)); )
printf("%s: got %zu from generator\n", __func__, *nump);
printf("%s: ok\n\n", __func__);
}
void nested_generator_c(struct channel * parent, void * arg) {
printf("%s: spawned from %s\n", __func__, (char *)arg);
for (int num = 1; num < 5; num++) {
printf("%s: yielding %d to parent\n", __func__, num);
yield_to(parent, &num);
}
printf("%s: no more output is coming\n", __func__);
}
void nested_generator_b(struct channel * parent, void * arg) {
printf("%s: spawned from %s\n", __func__, (char *)arg);
struct channel * child = coroutine_create(nested_generator_c, (void *)__func__);
int sum = 0;
for (const int * nump; (nump = from(child)); ) {
const int val = *nump;
sum += val;
printf("%s: got %d, yielding cumulative sum %d to parent\n", __func__, val, sum);
yield_to(parent, &sum);
}
printf("%s: ok, no more output is coming\n", __func__);
}
void nested_generator_a(void) {
printf("%s: example of multiple nested generator functions\n", __func__);
struct channel * child = coroutine_create(nested_generator_b, (void *)__func__);
for (const int * nump; (nump = from(child)); ) {
const int sum = *nump;
printf("%s: got %d\n", __func__, sum);
}
printf("%s: ok\n\n", __func__);
}
/* communication in both directions, involving pointers that are allocated in the child and freed in the parent */
void mirror(struct channel * parent, void * context) {
printf("%s: spawned from %s\n", __func__, (char *)context);
/* loop until parent sends NULL */
for (const char * string; (string = from(parent)); ) {
char * reflection;
asprintf(&reflection, "%s with goatee", string);
yield_to(parent, reflection);
}
printf("%s: ok\n", __func__);
}
void two_way_example(void) {
printf("%s: communication in both directions\n", __func__);
void * child = coroutine_create(mirror, (void *)__func__);
char * crew[] = { "kirk", "spock", "mccoy" };
for (size_t icrew = 0; icrew < sizeof(crew) / sizeof(crew[0]); icrew++) {
printf("%s: sending %s to child\n", __func__, crew[icrew]);
yield_to(child, crew[icrew]);
char * reflection = from(child);
printf("%s: got %s back from child\n", __func__, reflection);
free(reflection);
}
printf("%s: no more input is coming\n", __func__);
close_and_join(child);
printf("\n");
}
/* communication in both directions, controlled by child */
void another_mirror(struct channel * parent, void * context) {
printf("%s: spawned from %s\n", __func__, (char *)context);
char * crew[] = { "kirk", "spock", "mccoy" };
for (size_t icrew = 0; icrew < sizeof(crew) / sizeof(crew[0]); icrew++) {
printf("%s: sending %s to parent\n", __func__, crew[icrew]);
yield_to(parent, crew[icrew]);
char * reflection = from(parent);
printf("%s: got %s back from parent\n", __func__, reflection);
free(reflection);
}
printf("%s: done, returning\n", __func__);
}
void another_two_way_example(void) {
printf("%s: communication in both directions, controlled by child\n", __func__);
void * child = coroutine_create(another_mirror, (void *)__func__);
for (char * string; (string = from(child)); ) {
char * reflection;
asprintf(&reflection, "%s with goatee", string);
yield_to(child, reflection);
}
printf("%s: ok\n\n", __func__);
}
/* test generator that doesn't yield anything */
void generator_trivial(struct channel * parent __attribute((unused)), void * context) {
printf("%s: spawned from %s, just returning\n", __func__, (char *)context);
}
void consumer_trivial(void) {
printf("%s: this should not crash\n", __func__);
struct channel * child = coroutine_create(generator_trivial, (void *)__func__);
printf("%s: got here, just created child\n", __func__);
while (from(child));
printf("%s: done\n\n", __func__);
}
/* test generator with a parent that doesn't yield anything */
void child_consumer_trivial(struct channel * parent, void * context) {
printf("%s: spawned from %s\n", __func__, (char *)context);
while (from(parent));
printf("%s: ok\n", __func__);
}
void parent_to_child_trivial(void) {
printf("%s: this should not crash\n", __func__);
struct channel * child = coroutine_create(child_consumer_trivial, (void *)__func__);
printf("%s: no more input is coming\n", __func__);
close_and_join(child);
printf("%s: done\n\n", __func__);
}
/* test generator using the non-malloc interface for stack. warning this might blow up if you are using some instrumented version of fprintf that uses more than 8K of stack space */
void test_child_on_parent_stack(void) {
printf("%s\n", __func__);
fflush(stdout);
char block_unaligned[8192], * block = (char *)(((size_t)block_unaligned + 31) & ~31);
struct channel * child = coroutine_create_given_memory(generator_trivial, (void *)__func__, block, 8192 - 32);
while (from(child));
printf("%s: done\n\n", __func__);
}
/* star network - communication between children via a parent broker */
void star_network_first_child(struct channel * parent, void * context __attribute((unused))) {
yield_to(parent, "message for parent: hello");
yield_to(parent, "message for second child: hi");
printf("%s: done\n", __func__);
}
void star_network_second_child(struct channel * parent, void * context __attribute((unused))) {
for (char * string; (string = from(parent)); )
printf("%s: got message: %s\n", __func__, string);
printf("%s: ok\n", __func__);
}
void star_network(void) {
struct channel * first_child = coroutine_create(star_network_first_child, NULL);
struct channel * second_child = coroutine_create(star_network_second_child, NULL);
for (const char * string; (string = from(first_child)); ) {
printf("%s: from first child: %s\n", __func__, string);
if (strstr(string, "for second child: "))
yield_to(second_child, strchr(string, ':') + 2);
}
printf("%s: ok, telling second child no more input is coming\n", __func__);
close_and_join(second_child);
printf("%s: done\n\n", __func__);
}
/* demo that passes a buffer to a coroutine which fills it and passes it back */
void child_that_modifies_contents_of_pointer(struct channel * parent, void * context) {
const size_t bytes_per_yield = *((size_t *)context);
char letter = 'a';
/* child loops over buffers to fill from parent */
for (char * string; (string = from(parent)); ) {
/* and fills them */
for (size_t ibyte = 0; ibyte < bytes_per_yield; ibyte++) {
string[ibyte] = letter;
letter++;
if (letter > 'z')
letter = 'a';
}
/* and yields them back to parent */
yield_to(parent, string);
}
}
void test_yield_pointer_modify(void) {
size_t bytes_per_yield = 13;
char buffer[14] = { 0 };
struct channel * child = coroutine_create(child_that_modifies_contents_of_pointer, &bytes_per_yield);
for (size_t ipass = 0; ipass < 2; ipass++) {
/* parent yields buffer to child...*/
yield_to(child, buffer);
/* ...which fills it and passes it back, but note that the parent must treat the returned buffer as if it were an unrelated string */
const char * string = from(child);
printf("%s: %s\n", __func__, string);
}
close_and_join(child);
printf("\n");
}
int main(void) {
consumer();
nested_generator_a();
two_way_example();
another_two_way_example();
consumer_trivial();
parent_to_child_trivial();
test_child_on_parent_stack();
star_network();
test_yield_pointer_modify();
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment