Don't bother trying this. It's much simpler nowadays to grab a RPi2040, and use the PIO state machines to drive the output instead. Or just grab an ESP32-based board, load WLED on it, and get WiFi connected effects that you can control via it's built-in web server.
A while back, I was implementing support for the nRF52840 chipset in FastLED. This chipset had a core that handled bluetooth real-time transmissions via software. Since the chip is single-core, this meant that bluetooth traffic could pre-empt any CPU activity at any time. Given the stringent timing requirements for the WS2812 style LEDs (measured in nanoseconds), attempting to bit-bang the outputs was not going to be reliable enough.
One possibility I considered would offload the generation of the WS2812 signals to be entirely hardware-driven. This gist captures the key aspects of this design, and is dedicated to the public (public domain license).
PPI is Nordic's system on this chip, which enables configurable connection between an EVENT (e.g., from one peripheral) to trigger a TASK (e.g., for another peripheral). Actually, a single event can trigger two distinct TASK simultaneously, which is useful, especially as there is a limit of 20 PPI channels available.
The PPI system response is gated to a 16MHz clock. If triggered asynchronously, there could be up to one clock delay (~62.5ns). If one PPI trigger causes another PPI event, there is exactly one clock delay.
The GPIOTE allows TASK and EVENT based interaction with GPIO pins. In fact, this is how the arduino code connects interrupts to pins. For a pin configured as an input, GPIOTE allows an EVENT to trigger when the signal rises, falls, or changes. For a pin configured as an output, GPIOTE exposes three TASK registers that control the output of the pin: SET, CLEAR, and TOGGLE.
I immediately envisioned having the SPI peripheral configured to output on a pin, which would generate events on rise/fall. Of course, this did not work in practice, because a single GPIO could not be assigned to two peripherals, and even if it could, the SPIS pin was configured as an output, and thus could not generate the GPIOTE events.
In the end, the solution was simple, although not particularly elegant: It was necessary to EXTERNALLY bridge the SPIS output GPIO to at least one other GPIO that was configured as input. Then, that second GPIO could be assigned to the GPIOTE peripheral, which would trigger events on rise/fall/etc.
Of course, I initially wanted to use a single extra GPIO. However, the GPIOTE did
not have the ability to be configured with separate events for RISE vs. FALL.
A design that swapped between two CHG
(one for bit=0, one for bit=1) was created,
but it meant that any spurious trigger would corrupt all the remaining bits of output.
In contrast, separate events for RISE vs. FALL on the SPIS output simplified the
event and PPI design, although it required bridging two different pins to the
SPIS MISO output pin.
The WAVEDROM timing diagram will use a 16MHz clock:
- The PPI system is gated to 16MHz aka 62.5ns (nRF52840 specific)
- The TIMER max clock rate is 16MHz (nRF52840 specific)
- Only TIMER3 and TIMER4 have six CC registers (nRF52840 specific)
- To ensure synchronization SPIS peripheral is used, with the clock externally bridged from a TIMER peripheral.
- At start of a transfer, two steps are manually taken:
- SPIS CSN is manually held low for at least 1000ns (1us), ensuring the first bit is present on the output pin
- Based on whether the first bit to transfer is a zero or one, either Channel Group 0 or Channel Group 1 will be configured as enabled prior to transfer starting.
- SPIS is used to transmit, one bit at a time, an array of bytes of data.
- GPIOTE is used to generate SPIS clock, to detect SPIS output changes (enables/disables
channel group
CHGx
), and to control final output. - A TIMER whose period is set to match the total transmission time for single bit (e.g., 1250ns for WS2812)
- CMP0
D0
== when the output pin should fall, to output a zero bit (e.g., 250ns for WS2812); also increments counter - CMP1
D1
== when output pin should fall, to output a one bit (e.g., 1000ns for WS2812) - CMP2
SCLK=0
== when SCLK should go low - CMP3
SCLK=1
== when SCLK should go high
- CMP0
- A COUNTER is used with two comparators:
- CNT1
last bit
to indicate the last bit has been transmitted, which also disablesCHGy
, to prevent the output from being set high. - CNT0
delay
to indicate the post-data delay has finished, and disablesCHGz
to disables ALL the PPI channels used in this process. - PREFERABLY, would have interrupt enabled for CNT0, to indicate transmission has completed.
- PREFERABLY, would have interrupt enabled for CNT1, to indicate first point in time that the data buffer can be reclaimed and prepared for next transmission (after trasmission, but before delay has completed)
- CNT1
Pin | Name | I or O | Comment |
---|---|---|---|
SPIS_CLK | input | As configured in SPIS peripheral registers | |
SPIS_MISO | output | As configured in SPIS peripheral registers | |
SPIS_CSN | input | Pulling low causes SPIS to prepares the first output bit | |
GPIOTE_CLK | output | Externally bridged to SPIS_CLK, controlled via GPIOTE TASK | |
GPIOTE_MISO_RISE | input | Externally bridged to SPIS_MISO, GPIOTE event on rising signal | |
GPIOTE_MISO_FALL | input | Externally bridged to SPIS_MISO, GPIOTE event on rising signal | |
GPIOTE_OUTPUT | output | Final output of WS2812 data signal |
Channel group Z CHGz
is disabled after the final bit is transferred; disables the event that sets the output high.
Channel group X CHGx
is enabled/disabled based on data bit; toggles event that sets output low early (bit=0).
Channel group Y CHGy
is used to enable all the events at the start, and to disable all the events at the end.
CHG | PPI | EVENT | TASK | Comment |
---|---|---|---|---|
yz | 1 | CMP3(SCLK=1) | GPIOTE_OUTPUT.TASKS_SET | Sets output pin high, when CHGy enabled |
^^^^^^^^^^^^ |
GPIOTE_CLK.TASKS_SET | Sets the SPI clock high | ||
z | 2 | CMP2(SCLK=0) | GPIOTE_CLK.TASKS_CLEAR | Set the SPI clock low (loads next data bit) |
z | 3 | CMP1(D1) | COUNTERx.TASKS_COUNT | Increments count of total bits transmitted |
z | ^^^^^^^^^^^^ |
GPIOTE_OUTPUT.TASKS_CLEAR | Ensures output pin set low at longer of two bit lengths | |
x z | 4 | CMP0(D0) | GPIOTE_OUTPUT.TASKS_CLEAR | Timer sets output pin low, but only when CHGx enabled (data bit was zero) |
z | 5 | GPIOTE_MISO_RISE | PPI.TASKS_CHG[x].DIS | Disables CHGx when data bit is one |
z | 6 | GPIOTE_MISO_FALL | PPI.TASKS_CHG[x].EN | Enables CHGx when data bit is zero |
z | 7 | CNT1(last bit) | PPI.TASKS_CHG[y].DIS | Disables event that sets output high |
z | 8 | CNT0(delay) | PPI.TASKS_CHG[z].DIS | ENDS TRANSMISSION |
If it were not for the need to externally bridge a number of GPIO, I may have pursued this option. However, bridging pins adds a source of user error. Doing it safely (including at boot) would add external components, which brought it outside of practical, and into "thought experiment" zone.
As can be seen from the WaveDrom timing diagram, this configuration would provide accurate output,
and the CPU would only be needed to initially setup transfer, hold CSN low for 1us, and enable/disable
CHGx
according to the first bit to be sent.
A COUNTER interrupt would indicate the data buffer is ready to be re-used, while a second interrupt would indicate that the mandatory post-data-transmission delay has completed.