Skip to content

Instantly share code, notes, and snippets.

@josyb
Last active September 1, 2020 07:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save josyb/efec12907bb82a06e42758c039767c7f to your computer and use it in GitHub Desktop.
Save josyb/efec12907bb82a06e42758c039767c7f to your computer and use it in GitHub Desktop.
State Machines and Resets

One-Process, Two-Process and Three-Process State Machines

"a state machine is worth a thousand equations" - Josy Boelen (yes ...)

and their Resets

About State Machines

On the internet for a dealing with FPGA programming, I often read the advice to write One-Process state machines only.
And I usually beg to differ. (You shouldn't be surprised ...)
The principal arguments given are that a One-Process state machine description is 'free from latches' and at the same time requires less typing. Now being lazy is a hallmark of a good engineer, but ...
In a lot of cases a One-Process description fits exactly. However in my line of work this is rarely so.
To understand the deeper differences between the One- and Two-Process description we have to go back to Moore and Mealy. The outputs of a Moore machine depend solely on the current state whereas in a Mealy state machine they depend on both the current state and the curent inputs. As such a Moore machine can be described by a single pure synchronous process. The Mealy machine always needs a second combinatorial process to describe some of the outputs. A lot of peoaple claim they write One-Process State Machines, but almost always you will see the decoding the present state, sometimes even the next state, in a second (sometimes multiple) combinatorial process and possibly also in a third ((sometimes multiple) third synchronous process. Thus creating Two- and Three- Process State Machines in disguise. Sometimes these additional processes create conditions to be acted ipon by the original One-Process State Machine!
If the design task at hand is complex, things get dispersed and dificult to read / understand / maintain by others. Unfortunately for any code written more than 3 months ago, oneself belongs to the others :)

About Resets

@hgomersall drew my attention back to a blog from Olof Kindgren: https://olofkindgren.blogspot.com/2017/11/resetting-reset-handling.html where the author documents a clever way to mix signals that need a reset and signals that don't need a reset in a single always @(posedge clk) process. You won't be surprised if you hear me saying, that I am not a fan. I don't do much Verilog (I progressed from AHDL to VHDL and then on to MyHDL), but sometimes have to read other's work and I am always surprised that Verilog coders tend to create a lot of always @(*) handling combinatorial code and somewhat less of always @(posedge clk) processes. So you could easily divide the have-resets and the have-no-resets over two always @(posedge clk) processes; it would be just one or a few processes more ...

Wrapping up

This text is targeted at MyHDL developers. In the beginning MyHDL mimiced the Verilog method of declaring synchronous processes, and required you to handle the resetting of signals very much as in (both) VHDL and Verilog. Unfortunately the two dominant FPGA vendors differred on their reset approach (amomng other things ...) One preferred asynchronous resets, the other synchronous. Our BDFL sees MyHDL as a tool to develop vendor-independent code and came up with a (I was going to say clever ...) nice solution: @always_seq(clk.posedge, reset). Together with a new signal type: ResetSignal(val, active, isasync) we can now write vendor-agnostic code, deferring the reset instantiation until the simulation and conversion phase. It has one drawback; in such an @always_seq(clk.posedge, reset) all signals are reset, even the signals that actually wouldn't need a reset. The (obvious?) way to handle is to refactor the signals that don't need a reset to another @always_seq(clk.posedge, reset=None) process. As we declare reset as None no resetting code will be generated nor simulated. Mission accomplished. The explicit reset=None documents our intention to any future reader.

Documenting the following MyHDL code

The Python file 2-Register.py describes what others on the net call a skidbuffer.
In my case it is a block (pun intended) to break the long combinatorial Ready-Valid loop of a long pipelined algorithm. It also helps avoiding deadlock if you are a bit sloppy and not obeing the Ready-Valid rules (as also described in the e.g. AXI-S specification). I describe it as a Three-Process Stae Machine:

  • the first: a combinatorial process starts at line 37. This process combines the present state with the inputs and generates the next state to be registered along, and this is key in muy design work, with other combinatorial outputs.
  • the second: a synchronous process starts at line 77. This process does have a reset to properly initialize the present state. It main task is to register the next state in order to become the present state.
  • the third: a synchronous process starts at line 85. Note that this could be optional, as shown here, in case there would be no actual data to be transported. Also note that the actual clock enables, to register any data, are actually outputs of the first (combinatorial) process.
    You are welcome to peruse the generated VHDL and Verilog code.
'''
Created on 25 Jan 2016
@author: Josy
Distilled from ST_register.py
'''
from __future__ import print_function
from myhdl import (Signal, block, enum, always_comb, always_seq, instances)
from utilities.rtlprovider.rtlprovider import RtlProvider
from utilities.hdllib import duplicate
class Register(RtlProvider):
''' a module to break up the combinatorial chain of Ready/Valid signal
can also be seen as a Fifo of depth 2
'''
def __init__(self, Clk, Reset, Sink, Source=None):
''' accepting the Signals, generating omitted ones '''
self.Clk = Clk
self.Reset = Reset
self.Sink = Sink
self.Source = Source if Source is not None else duplicate(Sink)
@block
def rtl(self):
''' the logic '''
# "a state machine is worth a thousand equations" - Josy Boelen (yes ...)
registerstate = enum('EMPTY', 'TAKEN', 'QUEUED')
smn, smp = [Signal(registerstate.EMPTY) for _ in range(2)]
ldlq, ldlqw, sellqw = [Signal(bool(0)) for _ in range(3)]
@ always_comb
def smcomb():
''' Register: combinatorial part of the state Machine '''
self.Sink.Ready.next = 0
self.Source.Valid.next = 0
ldlq.next = 0
ldlqw.next = 0
sellqw.next = 0
if smp == registerstate.EMPTY:
self.Sink.Ready.next = 1
if self.Sink.Valid:
smn.next = registerstate.TAKEN
ldlq.next = 1
else:
smn.next = registerstate.EMPTY
elif smp == registerstate.TAKEN:
self.Sink.Ready.next = 1
self.Source.Valid.next = 1
if self.Sink.Valid and self.Source.Ready:
smn.next = registerstate.TAKEN
ldlq.next = 1
elif self.Sink.Valid:
smn.next = registerstate.QUEUED
ldlqw.next = 1
elif self.Source.Ready:
smn.next = registerstate.EMPTY
else:
smn.next = registerstate.TAKEN
elif smp == registerstate.QUEUED:
self.Source.Valid.next = 1
if self.Source.Ready:
smn.next = registerstate.TAKEN
ldlq.next = 1
sellqw.next = 1
else:
smn.next = registerstate.QUEUED
@ always_seq(self.Clk.posedge, reset=self.Reset)
def smsync():
''' Register: registered part of the state Machine '''
smp.next = smn
if self.Sink.Data is not None:
lqw = duplicate(self.Sink.Data)
@ always_seq(self.Clk.posedge, reset=None)
def smreg():
''' Register: registered dataflow part of the state Machine
can do without a Reset
'''
if ldlqw:
lqw.next = self.Sink.Data
if ldlq:
if sellqw:
self.Source.Data.next = lqw
else:
self.Source.Data.next = self.Sink.Data
return instances()
if __name__ == '__main__':
from myhdl import (intbv, ResetSignal, instance, StopSimulation, SimulationError)
from bb.fabric.interfaces.buses import SinkSource
from utilities.hdlutils import genClk, genReset, delayclks
@block
def tb_Register():
T_OPS = 64
T_WIDTH_D = 8
Clk = Signal(bool(0))
Reset = ResetSignal(0, active=1, isasync=True)
D = SinkSource(T_WIDTH_D)
Q = SinkSource(T_WIDTH_D)
dut = Register(Clk, Reset, D, Q)
dutrtl = dut.rtl()
results = []
# tally
ClkCount = Signal(intbv(0)[32:])
tCK = 10
# testdata
td = [i + 1 for i in range(T_OPS)]
@instance
def clkgen():
yield genClk(Clk, tCK, ClkCount)
@instance
def resetgen():
yield genReset(Clk, tCK, Reset)
@instance
def stimulusin():
yield D.feed(Clk, tCK, td, MODE='PseudoRandom')
yield delayclks(Clk, tCK, 32)
raise StopSimulation
@instance
def stimulusout():
yield Q.backpressure(Clk, tCK, MODE='PseudoRandom')
@instance
def resultmonitor():
idx = 0
cc = 0 # local clock count
passed = True
while True:
yield Clk.posedge
if Q.Valid and Q.Ready:
results.append(str(Q.Data))
if td[idx] != Q.Data:
passed = False
print("Mismatch at clock %d: received %d, 0x%x, expected %d, 0x%x (index %d)" % (cc, Q.Data, Q.Data, td[idx], td[idx], idx))
idx += 1
cc += 1
if not passed:
print('Testdata: {}'.format(td))
print('Results: {}'.format(results))
raise SimulationError("Failure! Failure!")
return dutrtl, clkgen, resetgen, stimulusin, stimulusout, resultmonitor
@block
def top_register(Clk, Reset, D, Q):
''' we need a wrapper to convert a class-based RTL block '''
return Register(Clk, Reset, D, Q).rtl()
def convert():
C_WIDTH_D = 12
Clk = Signal(bool(0))
Reset = ResetSignal(0, active=1, isasync=True)
D = SinkSource(C_WIDTH_D)
Q = SinkSource(C_WIDTH_D)
dfc = top_register(Clk, Reset, D, Q)
dfc.convert(hdl='VHDL', name='Register_{}'.format(C_WIDTH_D), std_logic_ports=True)
dfc.convert(name='Register_{}'.format(C_WIDTH_D), testbench=False)
dft = tb_Register()
dft.config_sim(trace=True)
dft.run_sim()
convert()
-- File: Register_12.vhd
-- Generated by MyHDL 0.10
-- Date: Fri Feb 14 19:08:29 2020
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use std.textio.all;
use work.pck_myhdl_010.all;
entity Register_12 is
port(
Clk : in std_logic;
Reset : in std_logic;
D_Data : in std_logic_vector(11 downto 0);
D_Ready : out std_logic;
D_Valid : in std_logic;
Q_Data : out std_logic_vector(11 downto 0);
Q_Ready : in std_logic;
Q_Valid : out std_logic
);
end entity Register_12;
-- we need a wrapper to convert a class-based RTL block
architecture MyHDL of Register_12 is
type t_enum_registerstate_1 is (
EMPTY,
TAKEN,
QUEUED
);
signal ldlq : std_logic;
signal ldlqw : std_logic;
signal sellqw : std_logic;
signal smn : t_enum_registerstate_1;
signal smp : t_enum_registerstate_1;
signal lqw : unsigned(11 downto 0);
signal D_Data_num : unsigned(11 downto 0);
signal Q_Data_num : unsigned(11 downto 0);
begin
D_Data_num <= unsigned(D_Data);
Q_Data <= std_logic_vector(Q_Data_num);
-- Register: combinatorial part of the state Machine
smcomb : process(all) is
begin
D_Ready <= '0';
Q_Valid <= '0';
ldlq <= '0';
ldlqw <= '0';
sellqw <= '0';
case smp is
when EMPTY =>
D_Ready <= '1';
if bool(D_Valid) then
smn <= TAKEN;
ldlq <= '1';
else
smn <= EMPTY;
end if;
when TAKEN =>
D_Ready <= '1';
Q_Valid <= '1';
if (bool(D_Valid) and bool(Q_Ready)) then
smn <= TAKEN;
ldlq <= '1';
elsif bool(D_Valid) then
smn <= QUEUED;
ldlqw <= '1';
elsif bool(Q_Ready) then
smn <= EMPTY;
else
smn <= TAKEN;
end if;
when QUEUED =>
Q_Valid <= '1';
if bool(Q_Ready) then
smn <= TAKEN;
ldlq <= '1';
sellqw <= '1';
else
smn <= QUEUED;
end if;
end case;
end process smcomb;
-- Register: registered part of the state Machine
smsync : process(Clk, Reset) is
begin
if (Reset = '1') then
smp <= EMPTY;
elsif rising_edge(Clk) then
smp <= smn;
end if;
end process smsync;
-- Register: registered dataflow part of the state Machine
-- can do without a Reset
smreg : process(Clk) is
begin
if rising_edge(Clk) then
if bool(ldlqw) then
lqw <= D_Data_num;
end if;
if bool(ldlq) then
if bool(sellqw) then
Q_Data_num <= lqw;
else
Q_Data_num <= D_Data_num;
end if;
end if;
end if;
end process smreg;
end architecture MyHDL;
// File: Register_12.v
// Generated by MyHDL 0.10
// Date: Fri Feb 14 19:08:29 2020
`timescale 1ns/10ps
module Register_12 (
Clk,
Reset,
D_Data,
D_Ready,
D_Valid,
Q_Data,
Q_Ready,
Q_Valid
);
// we need a wrapper to convert a class-based RTL block
input Clk;
input Reset;
input [11:0] D_Data;
output D_Ready;
reg D_Ready;
input D_Valid;
output [11:0] Q_Data;
reg [11:0] Q_Data;
input Q_Ready;
output Q_Valid;
reg Q_Valid;
reg ldlq;
reg ldlqw;
reg sellqw;
reg [1:0] smn;
reg [1:0] smp;
reg [11:0] lqw;
// Register: combinatorial part of the state Machine
always @(smp, Q_Ready, D_Valid) begin: smcomb
D_Ready = 0;
Q_Valid = 0;
ldlq = 0;
ldlqw = 0;
sellqw = 0;
case (smp)
2'b00: begin
D_Ready = 1;
if (D_Valid) begin
smn = 2'b01;
ldlq = 1;
end
else begin
smn = 2'b00;
end
end
2'b01: begin
D_Ready = 1;
Q_Valid = 1;
if ((D_Valid && Q_Ready)) begin
smn = 2'b01;
ldlq = 1;
end
else if (D_Valid) begin
smn = 2'b10;
ldlqw = 1;
end
else if (Q_Ready) begin
smn = 2'b00;
end
else begin
smn = 2'b01;
end
end
2'b10: begin
Q_Valid = 1;
if (Q_Ready) begin
smn = 2'b01;
ldlq = 1;
sellqw = 1;
end
else begin
smn = 2'b10;
end
end
endcase
end
// Register: registered part of the state Machine
always @(posedge Clk, posedge Reset) begin: smsync
if (Reset == 1) begin
smp <= 2'b00;
end
else begin
smp <= smn;
end
end
// Register: registered dataflow part of the state Machine
// can do without a Reset
always @(posedge Clk) begin: smreg
if (ldlqw) begin
lqw <= D_Data;
end
if (ldlq) begin
if (sellqw) begin
Q_Data <= lqw;
end
else begin
Q_Data <= D_Data;
end
end
end
endmodule
@LarsRlrs
Copy link

In some designs, every LUT input counts. How would this look like with registered outputs for D_Ready and Q_Valid?

@josyb
Copy link
Author

josyb commented Sep 1, 2020

This could be rewritten as a one-process state machine with registered D_Ready and Q_Valid. But beware; "what you lose at the swings you gain at the roundabouts" may/will apply. Sadly I don't have the time to make the comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment