Skip to content

Instantly share code, notes, and snippets.

@brabect1
Last active February 2, 2024 00:40
Show Gist options
  • Save brabect1/d0463621efd2a8318695d8320ac17807 to your computer and use it in GitHub Desktop.
Save brabect1/d0463621efd2a8318695d8320ac17807 to your computer and use it in GitHub Desktop.
Shows how to constrain an asynchronous counter. #sta #doc

STA Constraints of Asynchronous Counters

This work is licensed under a Creative Commons Attribution 4.0 International License. cc_by_40_logo

Resources

[1] Why am I Getting UITE-461 Messages and Zero Source Latency?, Solvent article No. 020373, last modified 12/13/2016, https://solvnet.synopsys.com/retrieve/020373.html

[2] How to Specify Clock Constraints for an Asynchronous Ripple Counter, Solvent article No. 2583058, last modified 2/22/2017, https://solvnet.synopsys.com/retrieve/2583058.html

Asynchronous Counters

These are counter structures where a next stage flop is clocked by a data output of the preceding stage. They are typically used as clock dividers to derive clocks with frequences scaled down by 2^n.

async_cnt

When used as counters, they do have some appealing properties. First their max frequency is dictated only by the first stage flop and is independent of the counter length. From the power consumption perspective, the counter has a dynamic consumption capped by twice of a single flop.

The downside is that the propagation delay of the asynchronous structure is generally unknown and may violate the synchronous timing, unless properly checked by STA. Unfortunately this is not as easy as it may seem.

Architectural Choices

The way how asynchronous counters are used in the design implies how we need to treat these counters during STA:

  • Prevent asynchronus signals to propagate to the synchronous domain when the counter is running
    • e.g. use combo gating on the clock domain boundary and hence may simplify STA checks
    • STA needs to check timing on LSB (i.e. the fastest toggling flop) and may ignore timing whatsoever on other flops
  • There is no prevention of asynchronous signals entering the synchronous domain
    • there is a risk of timing violations and hence needs thorough STA analysis

No Constraints

This approach shall be used only when there are prevention mechanism of the asynchronous counter outputs entering the synchronous domain. Timing of the first stage flop is checked by default (as it gets sourced by the synchronous domain clock).

pt_shell> report_clocks
****************************************
Report : clock
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:13:05 2019
****************************************
Attributes:
    p - Propagated clock
    G - Generated  clock
    I - Inactive   clock

Clock          Period   Waveform            Attrs     Sources        Voltage Config
-----------------------------------------------------------------------------------
CLK            50.000   {0 25}              p         {clk}
VCLK           50.000   {0 25}                        {}


# Check timing on the 1st stage flop
pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from CLK -to async_cnt_reg_0
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:15:03 2019
****************************************
Warning: There are 3 invalid end points for constrained paths. (UITE-416)

  Startpoint: async_cnt_reg_0
               (rising edge-triggered flip-flop clocked by CLK)
  Endpoint: async_cnt_reg_0
               (rising edge-triggered flip-flop clocked by CLK)
  Last common pin: clk
  Path Group: CLK
  Path Type: max
  Max Data Paths Derating Factor  : 1.100
  Min Clock Paths Derating Factor : 0.900
  Max Clock Paths Derating Factor : 1.100

  Point                                                   Incr       Path
  ------------------------------------------------------------------------------
  clock CLK (rise edge)                                  0.000      0.000
  clock source latency                                   0.000      0.000
  clk (in)                                               0.000      0.000 r
  async_cnt_reg_0/CK (dffprx05_d)                        0.000      0.000 r
  async_cnt_reg_0/QN (dffprx05_d) <-                     0.718      0.718 f
  async_cnt_reg_0/D (dffprx05_d)                         0.000      0.718 f
  data arrival time                                                 0.718

  clock CLK (rise edge)                                 50.000     50.000
  clock source latency                                   0.000     50.000
  clk (in)                                               0.000     50.000 r
  async_cnt_reg_0/CK (dffprx05_d)                        0.000     50.000 r
  clock reconvergence pessimism                          0.000     50.000
  library setup time                                    -0.259     49.741
  data required time                                               49.741
  ------------------------------------------------------------------------------
  data required time                                               49.741
  data arrival time                                                -0.718
  ------------------------------------------------------------------------------
  slack (MET)                                                      49.023

The synchornous clock propagation stops at the first stage flops and hence the other stage flops remain unconstrained. We can leave it as that since we assume the potential timing vilations are mitigated architecturaly.

# Check that the 2nd stage flop is indeed unconstrained
pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from CLK -to async_cnt_reg_1
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:16:24 2019
****************************************
Warning: There are 3 invalid end points for constrained paths. (UITE-416)
Warning: There are 2 invalid end points for unconstrained paths. (UITE-416)

  Startpoint: async_cnt_reg_0
               (rising edge-triggered flip-flop clocked by CLK)
  Endpoint: async_cnt_reg_1/CKN
               (internal pin)
  Path Group: (none)
  Path Type: max
  Max Data Paths Derating Factor  : 1.100
  Min Clock Paths Derating Factor : 0.900
  Max Clock Paths Derating Factor : 1.100

  Point                                                   Incr       Path
  ------------------------------------------------------------------------------
  clock CLK (source latency)                             0.000      0.000
  clk (in)                                               0.000      0.000 r
  async_cnt_reg_0/CK (dffprx05_d)                        0.000      0.000 r
  async_cnt_reg_0/Q (dffprx05_d)                         0.539      0.539 f
  async_cnt_reg_1/CKN (dffnrx1_d)                        0.000      0.539 f
  data arrival time                                                 0.539
  ------------------------------------------------------------------------------
  (Path is unconstrained)

pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from async_cnt_reg_1 -to CLK
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:16:49 2019
****************************************
Warning: There are 4 invalid start points. (UITE-416)

  Startpoint: async_cnt_reg_1
               (falling edge-triggered flip-flop)
  Endpoint: capture_reg_reg_1
               (rising edge-triggered flip-flop clocked by CLK)
  Path Group: (none)
  Path Type: max
  Max Data Paths Derating Factor  : 1.100
  Min Clock Paths Derating Factor : 0.900
  Max Clock Paths Derating Factor : 1.100

  Point                                                   Incr       Path
  ------------------------------------------------------------------------------
  async_cnt_reg_1/CKN (dffnrx1_d)                        0.000      0.000 f
  async_cnt_reg_1/Q (dffnrx1_d) <-                       0.762      0.762 f
  g1829__7114/Q (ao222x05_d)                             0.937      1.698 f
  capture_reg_reg_1/D (dffprqx05_d)                      0.000      1.698 f
  data arrival time                                                 1.698
  ------------------------------------------------------------------------------
  (Path is unconstrained)

Realistic Timing Constraints

With no architectural mitigation, we need to perform a thorough STA analysis. The core idea is to define propagated clocks along all stages of the asynchronous counter. We are going to show the straightforward approach may challenge the STA tool limitations. Next section will show the trick to work that around.

Asynchronous counter is essentially a clock divider and so we propagate clocks through every counter's stage as through a divider by two.

pt_shell> create_generated_clock [get_pins -of async_cnt_reg_0 -filter direction=~out] \
    -add \
    -divide_by 2 \
    -master_clock CLK \
    -source [get_pins -of async_cnt_reg_0 -filter is_clock_pin==true] \
    -name async_cnt_ck_0; \
set_propagated_clock async_cnt_ck_0

pt_shell> for {set i 1} {$i < 23} {incr i} { \
    create_generated_clock [get_pins -of async_cnt_reg_${i} -filter lib_pin_name==Q] \
        -add \
        -divide_by 2 \
        -master_clock async_cnt_ck_[expr $i - 1] \
        -source [get_pins -of async_cnt_reg_${i} -filter is_clock_pin==true] \
        -preinvert \
        -name async_cnt_ck_${i}; \
    set_propagated_clock async_cnt_ck_${i}; \
}

pt_shell> update_timing -full

This is how the clock definition would look like:

pt_shell> report_clocks
****************************************
Report : clock
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:34:35 2019
****************************************
Attributes:
    p - Propagated clock
    G - Generated  clock
    I - Inactive   clock

Clock            Period   Waveform            Attrs     Sources        Voltage Config
-------------------------------------------------------------------------------------
CLK              50.000   {0 25}              p         {clk}
VCLK             50.000   {0 25}                        {}
async_cnt_ck_0  100.000   {0 50}              p, G      {async_cnt_reg_0/Q async_cnt_reg_0/QN}
async_cnt_ck_1  200.000   {50 150}            p, G      {async_cnt_reg_1/Q}
async_cnt_ck_2  400.000   {150 350}           p, G      {async_cnt_reg_2/Q}
...
async_cnt_ck_10 102400.000
                          {51150 102350}      p, G      {async_cnt_reg_10/Q}
...
async_cnt_ck_22
                419430400.000
                          {2.09715e+08 4.1943e+08}
                                              p, G      {async_cnt_reg_22/Q}

Generated           Master                  Generated                               Master          Waveform
Clock               Source                  Source                                  Clock           Modification
-----------------------------------------------------------------------------------------------------------------
async_cnt_ck_0      async_cnt_reg_0/CK      async_cnt_reg_0/Q async_cnt_reg_0/QN    CLK             div(2)
async_cnt_ck_1      async_cnt_reg_1/CKN     async_cnt_reg_1/Q                       async_cnt_ck_0  div(2)
async_cnt_ck_2      async_cnt_reg_2/CKN     async_cnt_reg_2/Q                       async_cnt_ck_1  div(2)
...
async_cnt_ck_10     async_cnt_reg_10/CKN    async_cnt_reg_10/Q                      async_cnt_ck_9  div(2)
...
async_cnt_ck_22     async_cnt_reg_22/CKN    async_cnt_reg_22/Q                      async_cnt_ck_21 div(2)

To check the tool properly accounts for the timing delay through all the stages of the diveder, check some of the later stages. The natural expectation is that the later the stage, the lower slack we get:

pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from async_cnt_reg_10 -to CLK -group CLK
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -group CLK
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:47:09 2019
****************************************
Warning: There are 3 invalid start points. (UITE-416)

Startpoint: async_cnt_reg_10/Q
            (clock source 'async_cnt_ck_10')
Endpoint: capture_reg_reg_10
            (rising edge-triggered flip-flop clocked by CLK)
Last common pin: clk
Path Group: CLK
Path Type: max
Max Data Paths Derating Factor  : 1.100
Min Clock Paths Derating Factor : 0.900
Max Clock Paths Derating Factor : 1.100

Point                                                   Incr       Path
------------------------------------------------------------------------------
clock async_cnt_ck_10 (fall edge)                   102350.000 102350.000
clock CLK (source latency)                             0.000   102350.000
clk (in)                                               0.000   102350.000 r
async_cnt_reg_0/Q (dffprx05_d) (gclock source)         0.539   102350.539 f
async_cnt_reg_1/Q (dffnrx1_d) (gclock source)          0.762   102351.300 f
async_cnt_reg_2/Q (dffnrx1_d) (gclock source)          0.755   102352.055 f
...
async_cnt_reg_9/Q (dffnrx1_d) (gclock source)          0.755   102357.341 f
async_cnt_reg_10/Q (dffnrx1_d) (gclock source)         0.755   102358.096 f
async_cnt_reg_10/Q (dffnrx1_d)                         0.000   102358.096 f
g1840__1309/Q (ao222x05_d)                             0.936   102359.033 f
capture_reg_reg_10/D (dffprqx05_d)                     0.000   102359.033 f
data arrival time                                              102359.031

clock CLK (rise edge)                               102400.000 102400.000
clock source latency                                   0.000   102400.000
clk (in)                                               0.000   102400.000 r
capture_reg_reg_10/CK (dffprqx05_d)                    0.000   102400.000 r
clock reconvergence pessimism                          0.000   102400.000
library setup time                                    -0.291   102399.709
data required time                                             102399.711
------------------------------------------------------------------------------
data required time                                             102399.711
data arrival time                                              -102359.031
------------------------------------------------------------------------------
slack (MET)                                                      40.677


pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from async_cnt_reg_11 -to CLK -group CLK
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -group CLK
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:48:49 2019
****************************************
Warning: There are 3 invalid start points. (UITE-416)

Startpoint: async_cnt_reg_11/Q
            (clock source 'async_cnt_ck_11')
Endpoint: capture_reg_reg_11
            (rising edge-triggered flip-flop clocked by CLK)
Last common pin: clk
Path Group: CLK
Path Type: max
Max Data Paths Derating Factor  : 1.100
Min Clock Paths Derating Factor : 0.900
Max Clock Paths Derating Factor : 1.100

Point                                                   Incr       Path
------------------------------------------------------------------------------
clock async_cnt_ck_11 (fall edge)                   204750.000 204750.000
clock CLK (source latency)                             0.000   204750.000
clk (in)                                               0.000   204750.000 r
async_cnt_reg_0/Q (dffprx05_d) (gclock source)         0.539   204750.539 f
async_cnt_reg_1/Q (dffnrx1_d) (gclock source)          0.762   204751.300 f
async_cnt_reg_2/Q (dffnrx1_d) (gclock source)          0.755   204752.055 f
...
async_cnt_reg_10/Q (dffnrx1_d) (gclock source)         0.755   204758.096 f
async_cnt_reg_11/Q (dffnrx1_d) (gclock source)         0.755   204758.851 f
async_cnt_reg_11/Q (dffnrx1_d)                         0.000   204758.851 f
g1818__4547/Q (ao222x05_d)                             0.936   204759.788 f
capture_reg_reg_11/D (dffprqx05_d)                     0.000   204759.788 f
data arrival time                                              204759.781

clock CLK (rise edge)                               204800.000 204800.000
clock source latency                                   0.000   204800.000
clk (in)                                               0.000   204800.000 r
capture_reg_reg_11/CK (dffprqx05_d)                    0.000   204800.000 r
clock reconvergence pessimism                          0.000   204800.000
library setup time                                    -0.291   204799.709
data required time                                             204799.703
------------------------------------------------------------------------------
data required time                                             204799.703
data arrival time                                              -204759.781
------------------------------------------------------------------------------
slack (MET)                                                      39.922

The unfortunate effect is that the N-th stage flop will have period of 2^N times the period of the base clock. For large dividers, it is likely to hit the maximum window the STA tool will consider for expanding the base period.

pt_shell> update_timing -full
...
Warning: For computing a common base period for a number of clocks 
         PrimeTime limits the waveform expansion of the smallest 
         period to be no more than 1000 times and the waveform 
         expansion of the largest period to be no more than 101 times. 
         Since the largest period is too large compared to the smallest period, 
         no common base period is possible satisfying these limits, and 
         PrimeTime has taken the largest period as the common base period 
         but still has not expanded the smallest period beyond its limit. 
         In certain situations, this can cause paths between these clocks 
         to be unconstrained.  (PTE-052)
...

In other words, with N > 9 we would be hitting the tool's limit and may start getting wrong reports:

pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from async_cnt_reg_15 -to CLK -group CLK
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -group CLK
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:51:18 2019
****************************************
Warning: There are 3 invalid start points. (UITE-416)

Startpoint: async_cnt_reg_15/Q
            (clock source 'async_cnt_ck_15')
Endpoint: capture_reg_reg_15
            (rising edge-triggered flip-flop clocked by CLK)
Last common pin: clk
Path Group: CLK
Path Type: max
Max Data Paths Derating Factor  : 1.100
Min Clock Paths Derating Factor : 0.900
Max Clock Paths Derating Factor : 1.100

Point                                                   Incr       Path
------------------------------------------------------------------------------
clock async_cnt_ck_15 (fall edge)                   3276750.000
                                                                3276750.000
clock CLK (source latency)                             0.000   3276750.000
clk (in)                                               0.000   3276750.000 r
async_cnt_reg_0/Q (dffprx05_d) (gclock source)         0.539   3276750.539 f
async_cnt_reg_1/Q (dffnrx1_d) (gclock source)          0.762   3276751.300 f
async_cnt_reg_2/Q (dffnrx1_d) (gclock source)          0.755   3276752.055 f
async_cnt_reg_3/Q (dffnrx1_d) (gclock source)          0.755   3276752.810 f
...
async_cnt_reg_15/Q (dffnrx1_d) (gclock source)         0.755   3276761.872 f
async_cnt_reg_15/Q (dffnrx1_d)                         0.000   3276761.872 f
g1824__3772/Q (ao222x05_d)                             0.936   3276762.808 f
capture_reg_reg_15/D (dffprqx05_d)                     0.000   3276762.808 f
data arrival time                                              3276762.750

clock CLK (rise edge)                               3250050.000
                                                                3250050.000
clock source latency                                   0.000   3250050.000
clk (in)                                               0.000   3250050.000 r
capture_reg_reg_15/CK (dffprqx05_d)                    0.000   3250050.000 r
clock reconvergence pessimism                          0.000   3250050.000
library setup time                                    -0.291   3250049.709
data required time                                             3250049.750
------------------------------------------------------------------------------
data required time                                             3250049.750
data arrival time                                              -3276762.750
------------------------------------------------------------------------------
slack (VIOLATED)                                               -26713.100

Virtual Timing Constraints

To overcome the limitation with expanding the timing check window, we need to resort to a "virtual timing" approach. From the principles of STA, we frankly do not need to use the real timing waveform. All that counts is proper definition of clock edges and accounting for the delay latency.

In other words, if we use multiply_by 1 instead of -divide_by 2, we will keep the maximum clock period limited and still get the clock latency accounted for.

Note that using -divide_by 1 would not work, which is likely due to the way the tool treats the edge relationship between the -source clock and the generated clock, and we would be getting UITE-461 errors. Since our goal is to keep the one to one ratio between the source and generated clocks, it should not matter if dividing or multiplying by one.

pt_shell> create_generated_clock [get_pins -of async_cnt_reg_0 -filter direction=~out] \
    -add \
    -divide_by 2 \
    -master_clock CLK \
    -source [get_pins -of async_cnt_reg_0 -filter is_clock_pin==true] \
    -name async_cnt_ck_0; \
set_propagated_clock async_cnt_ck_0

pt_shell> for {set i 1} {$i < 23} {incr i} { \
    create_generated_clock [get_pins -of async_cnt_reg_${i} -filter lib_pin_name==Q] \
        -add \
        -multiply_by 1 \
        -master_clock async_cnt_ck_[expr $i - 1] \
        -source [get_pins -of async_cnt_reg_${i} -filter is_clock_pin==true] \
        -preinvert \
        -name async_cnt_ck_${i}; \
    set_propagated_clock async_cnt_ck_${i}; \
}

pt_shell> update_timing -full

This is how the clock definitions would look like:

pt_shell> report_clocks
****************************************
Report : clock
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 14:59:14 2019
****************************************
Attributes:
    p - Propagated clock
    G - Generated  clock
    I - Inactive   clock

Clock            Period   Waveform            Attrs     Sources        Voltage Config
-------------------------------------------------------------------------------------
CLK              50.000   {0 25}              p         {clk}
VCLK             50.000   {0 25}                        {}
async_cnt_ck_0  100.000   {0 50}              p, G      {async_cnt_reg_0/Q async_cnt_reg_0/QN}
async_cnt_ck_1  100.000   {50 100}            p, G      {async_cnt_reg_1/Q}
async_cnt_ck_2  100.000   {0 50}              p, G      {async_cnt_reg_2/Q}
...
async_cnt_ck_10 100.000   {0 50}              p, G      {async_cnt_reg_10/Q}
...
async_cnt_ck_22 100.000   {0 50}              p, G      {async_cnt_reg_22/Q}



Generated     Master          Generated       Master          Waveform
Clock         Source          Source          Clock           Modification
-------------------------------------------------------------------------------
async_cnt_ck_0  async_cnt_reg_0/CK      async_cnt_reg_0/Q async_cnt_reg_0/QN    CLK             div(2)
async_cnt_ck_1  async_cnt_reg_1/CKN     async_cnt_reg_1/Q                       async_cnt_ck_0  mult(1)
async_cnt_ck_2  async_cnt_reg_2/CKN     async_cnt_reg_2/Q                       async_cnt_ck_1  mult(1)
...
async_cnt_ck_10 async_cnt_reg_10/CKN    async_cnt_reg_10/Q                      async_cnt_ck_9  mult(1)
...
async_cnt_ck_22 async_cnt_reg_22/CKN    async_cnt_reg_22/Q                      async_cnt_ck_21 mult(1)

We can see the slacks come out the same as in the Realistic Timing Constraints:

pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from async_cnt_reg_10 -to CLK -group CLK
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -group CLK
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 15:07:42 2019
****************************************
Warning: There are 3 invalid start points. (UITE-416)

Startpoint: async_cnt_reg_10/Q
            (clock source 'async_cnt_ck_10')
Endpoint: capture_reg_reg_10
            (rising edge-triggered flip-flop clocked by CLK)
Last common pin: clk
Path Group: CLK
Path Type: max
Max Data Paths Derating Factor  : 1.100
Min Clock Paths Derating Factor : 0.900
Max Clock Paths Derating Factor : 1.100

Point                                                   Incr       Path
------------------------------------------------------------------------------
clock async_cnt_ck_10 (fall edge)                     50.000     50.000
clock CLK (source latency)                             0.000     50.000
clk (in)                                               0.000     50.000 r
async_cnt_reg_0/Q (dffprx05_d) (gclock source)         0.539     50.539 f
async_cnt_reg_1/Q (dffnrx1_d) (gclock source)          0.762     51.300 f
async_cnt_reg_2/Q (dffnrx1_d) (gclock source)          0.755     52.055 f
...
async_cnt_reg_9/Q (dffnrx1_d) (gclock source)          0.755     57.341 f
async_cnt_reg_10/Q (dffnrx1_d) (gclock source)         0.755     58.096 f
async_cnt_reg_10/Q (dffnrx1_d)                         0.000     58.096 f
g1840__1309/Q (ao222x05_d)                             0.936     59.033 f
capture_reg_reg_10/D (dffprqx05_d)                     0.000     59.033 f
data arrival time                                                59.033

clock CLK (rise edge)                                100.000    100.000
clock source latency                                   0.000    100.000
clk (in)                                               0.000    100.000 r
capture_reg_reg_10/CK (dffprqx05_d)                    0.000    100.000 r
clock reconvergence pessimism                          0.000    100.000
library setup time                                    -0.291     99.709
data required time                                               99.709
------------------------------------------------------------------------------
data required time                                               99.709
data arrival time                                               -59.033
------------------------------------------------------------------------------
slack (MET)                                                      40.677


pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from async_cnt_reg_11 -to CLK -group CLK
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -group CLK
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 15:08:55 2019
****************************************
Warning: There are 3 invalid start points. (UITE-416)

Startpoint: async_cnt_reg_11/Q
            (clock source 'async_cnt_ck_11')
Endpoint: capture_reg_reg_11
            (rising edge-triggered flip-flop clocked by CLK)
Last common pin: clk
Path Group: CLK
Path Type: max
Max Data Paths Derating Factor  : 1.100
Min Clock Paths Derating Factor : 0.900
Max Clock Paths Derating Factor : 1.100

Point                                                   Incr       Path
------------------------------------------------------------------------------
clock async_cnt_ck_11 (fall edge)                      0.000      0.000
clock CLK (source latency)                             0.000      0.000
clk (in)                                               0.000      0.000 r
async_cnt_reg_0/Q (dffprx05_d) (gclock source)         0.539      0.539 f
async_cnt_reg_1/Q (dffnrx1_d) (gclock source)          0.762      1.300 f
async_cnt_reg_2/Q (dffnrx1_d) (gclock source)          0.755      2.055 f
...
async_cnt_reg_10/Q (dffnrx1_d) (gclock source)         0.755      8.096 f
async_cnt_reg_11/Q (dffnrx1_d) (gclock source)         0.755      8.851 f
async_cnt_reg_11/Q (dffnrx1_d)                         0.000      8.851 f
g1818__4547/Q (ao222x05_d)                             0.936      9.788 f
capture_reg_reg_11/D (dffprqx05_d)                     0.000      9.788 f
data arrival time                                                 9.788

clock CLK (rise edge)                                 50.000     50.000
clock source latency                                   0.000     50.000
clk (in)                                               0.000     50.000 r
capture_reg_reg_11/CK (dffprqx05_d)                    0.000     50.000 r
clock reconvergence pessimism                          0.000     50.000
library setup time                                    -0.291     49.709
data required time                                               49.709
------------------------------------------------------------------------------
data required time                                               49.709
data arrival time                                                -9.788
------------------------------------------------------------------------------
slack (MET)                                                      39.922

We can also see the proper timing for all the other divider stages, including MSB:

pt_shell> report_timing -path_type full_clock_expanded -delay_type max -from async_cnt_reg_23 -to CLK -group CLK
****************************************
Report : timing
        -path_type full_clock_expanded
        -delay_type max
        -max_paths 1
        -group CLK
        -sort_by slack
Design : async_cnt
Version: O-2018.06-SP4
Date   : Sat Jun 15 15:10:12 2019
****************************************
Warning: There are 4 invalid start points. (UITE-416)

Startpoint: async_cnt_reg_23
            (falling edge-triggered flip-flop clocked by async_cnt_ck_22)
Endpoint: capture_reg_reg_23
            (rising edge-triggered flip-flop clocked by CLK)
Last common pin: clk
Path Group: CLK
Path Type: max
Max Data Paths Derating Factor  : 1.100
Min Clock Paths Derating Factor : 0.900
Max Clock Paths Derating Factor : 1.100

Point                                                   Incr       Path
------------------------------------------------------------------------------
clock async_cnt_ck_22 (fall edge)                     50.000     50.000
clock CLK (source latency)                             0.000     50.000
clk (in)                                               0.000     50.000 r
async_cnt_reg_0/Q (dffprx05_d) (gclock source)         0.539     50.539 f
async_cnt_reg_1/Q (dffnrx1_d) (gclock source)          0.762     51.300 f
async_cnt_reg_2/Q (dffnrx1_d) (gclock source)          0.755     52.055 f
...
async_cnt_reg_22/Q (dffnrx1_d) (gclock source)         0.755     67.157 f
async_cnt_reg_23/CKN (dffnrx1_d)                       0.000     67.157 f
async_cnt_reg_23/QN (dffnrx1_d) <-                     1.058     68.215 r
g1856/Q (invx4_d)                                      0.766     68.981 f
g1837__8780/Q (ao222x05_d)                             1.203     70.184 f
capture_reg_reg_23/D (dffprqx05_d)                     0.000     70.184 f
data arrival time                                                70.184

clock CLK (rise edge)                                100.000    100.000
clock source latency                                   0.000    100.000
clk (in)                                               0.000    100.000 r
capture_reg_reg_23/CK (dffprqx05_d)                    0.000    100.000 r
clock reconvergence pessimism                          0.000    100.000
library setup time                                    -0.293     99.707
data required time                                               99.707
------------------------------------------------------------------------------
data required time                                               99.707
data arrival time                                               -70.184
------------------------------------------------------------------------------
slack (MET)                                                      29.523
module async_cnt(
// MSB of the async counter
output logic msb,
// latch shift-register
input logic capture_en,
input logic shift_en,
output logic serial_out,
// clock & reset
input logic clk,
input logic rst_n
);
// bit-width of the counter, MSB acts as a saturating overflow flag
localparam int unsigned CNT_WIDTH = 24;
// asynchronous up counter
logic [CNT_WIDTH-1:0] async_cnt;
// Clocks for individual flops of the counter
logic [CNT_WIDTH-1:0] async_cnt_ck;
// async, active low reset of the counter
logic async_cnt_rst_n;
logic [CNT_WIDTH-1:0] capture_reg;
assign serial_out = capture_reg[$high(capture_reg)];
assign msb = async_cnt[$high(async_cnt)];
assign async_cnt_rst_n = rst_n;
assign async_cnt_ck[$high(async_cnt):0] =
{~async_cnt[$high(async_cnt)-1:0],clk};
// 1st stage flop
always_ff @(posedge async_cnt_ck[0] or negedge async_cnt_rst_n) begin: p_cnt_lsb
if (!async_cnt_rst_n)
async_cnt[0] <= 1'b0;
else
async_cnt[0] <= ~async_cnt[0];
end: p_cnt_lsb
// Last stage (overflow): When the MSB is in the least byte, it
// cannot be loaded.
always_ff @(posedge async_cnt_ck[$high(async_cnt)] or negedge async_cnt_rst_n) begin: p_cnt_msb
if (!async_cnt_rst_n)
async_cnt[$high(async_cnt)] <= 1'b0;
else
async_cnt[$high(async_cnt)] <= 1'b1;
end: p_cnt_msb
// all other stage flops
for (genvar gi=1; gi < $high(async_cnt); gi++) begin: g_cnt
always_ff @(posedge async_cnt_ck[gi] or negedge async_cnt_rst_n) begin: p_cnt
if (!async_cnt_rst_n)
async_cnt[gi] <= 1'b0;
else
async_cnt[gi] <= !async_cnt[gi];
end: p_cnt
end: g_cnt
always_ff @(posedge clk or negedge rst_n) begin: p_capture_reg
if (!rst_n) begin
capture_reg <= '0;
end
else begin
if (capture_en) capture_reg <= async_cnt;
else if (shift_en) capture_reg <= {capture_reg[$high(capture_reg)-1:0],1'b0};
end
end: p_capture_reg
endmodule
create_clock -period 50.0 -name CLK clk
create_clock -period 50.0 -name VCLK
set ins [remove_from_collection [get_ports * -filter {direction == in}] {clk}]
set_input_delay -min 1.0 -clock VCLK ${ins}
set_input_delay -max 10.0 -clock VCLK ${ins}
set outs [get_ports * -filter {direction == out}]
set_output_delay -max 10.0 -clock VCLK ${outs}
set_output_delay -min -5.0 -clock VCLK ${outs}
# Input transitions
set_input_transition -max 1 [get_ports * -filter {direction == in}]
set_input_transition -max 0.01 [get_ports * -filter {direction == in}]
# Using wide range to cover different cases.
set_load -min 0.01 [all_outputs]
set_load -max 1.0 [all_outputs]
@gkamendje
Copy link

Hello thanks for sharing this.
Typically, the output of a counter will be used for comparison against predefined thresholds. Such as if (counter_val > MAX_SOMETHING_CYCLES) then do_something
I do not really understand what you mean by
This approach shall be used only when there are prevention mechanism of the asynchronous counter outputs entering the synchronous domain. Could you please clarify what you mean?

Assuming the Realistic Timing approach, I would like to know your opinion on the consequences of such an asynchronous counter constrain on the Clock Tree Synthesis during Placement and routing. From what I can see, the PnR tools will try to balance the related clocks with the generated clock of the asynchronous counter. This leads to a very deep clock tree with numerous buffers. At high clock frequencies, the power consumed by the clock networks tends to offset the gain obtained by the asynchronous counter approach. Have you ever faced a similar issue? If yes any thoughts on how to mitigate this effect?
G

@brabect1
Copy link
Author

Hi @gkamendje,
Using an async counter in comparisons is generally a trouble as you are likely to use the comparison result in a synchronous logic. Instead, one use case can be to count number of events during a time interval (i.e. during asserted enable); in that case, you can use the enable signal to gate the asynchronously toggling bits to enter the synchronous domain. That is what I meant by a prevention mechanism.

If you needed to feed the asynchronous bits directly to a synchronous domain, your clock period would need to be longer than the propagation delay through the whole counter; that could work for a low-power, low-frequency designs, not for something high speed.

As for PnR, I am not an expert, but you would normally exclude bits of the counter from the other skew groups; very likely each bit would form its own separate skew group. P&R tool would then not try to balance the counter's bits among themselves and neither with all the other sinks. I have always used async counters for very specific purposes; in most digital designs you want to avoid them as they often bring more troubles than they are worth.

Hope that explained some of your concerns.

@gkamendje
Copy link

Hi @brabect1,
Thanks for your answer. Indeed, excluding the bits of the asynchronous counter from the other skew groups is the way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment