Lofty Ravenslofty

## hd.txt
=== top ===

   Number of wires:                450
   Number of wire bits:            464
   Number of public wires:         450
   Number of public wire bits:     464
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                422

## dirgolem.c
#include <stdio.h>

typedef signed long long I64;
typedef unsigned long long U64;
#define C64(constantU64) constantU64##ULL

const U64 notAFile = 0xfefefefefefefefe;
const U64 notABFile = 0xfcfcfcfcfcfcfcfc;
const U64 notGHFile = 0x3f3f3f3f3f3f3f3f;
const U64 notHFile = 0x7f7f7f7f7f7f7f7f;

## fastfir_dynamictaps.txt
Old: synth_intel_alm

=== fastfir_dynamictaps ===

   Number of wires:              35393
   Number of wire bits:         116525
   Number of public wires:       35393
   Number of public wire bits:  116525
   Number of memories:               0
   Number of memory bits:            0

## lut-reading-materials.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              9 stars
            
          
                Ravenslofty
                / lut-reading-materials.md
            
            
              Last active
              April 19, 2024 15:40
            
          
    FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs


the classic paper
compute LUT mappings through maximum-flow
produces optimal depth designs
but needs major area recovery, e.g. flow-pack
implemented in the Yosys flowmap pass.

On Area/Depth Trade-off in LUT-based FPGA Technology Mapping


relaxing requirement of one logic gate to one LUT allows area recovery
duplicates logic to produce more LUT mapping opportunities


## quartus.txt
+------------------------------------------------------------------------------------------------------------------------+
; Fitter Partition Statistics                                                                                            ;
+-------------------------------------------------------------+-------------------------+--------------------------------+
; Statistic                                                   ; Top                     ; hard_block:auto_generated_inst ;
+-------------------------------------------------------------+-------------------------+--------------------------------+
; Logic utilization (ALMs needed / total ALMs on device)      ; 3123 / 113560 ( 3 % )   ; 0 / 113560 ( 0 % )             ;
; ALMs needed [=A-B+C]                                        ; 3123                    ; 0                              ;
;     [A] ALMs used in final placement [=a+b+c+d]             ; 4153 / 113560 ( 4 % )   ; 0 / 113560 ( 0 % )             ;
;         [a] ALMs used for LUT logic an

## alm_sim.v
// The core logic primitive of the Cyclone V/10GX is the Adaptive Logic Module
// (ALM). Each ALM is made up of an 8-input, 2-output look-up table, covered
// in this file, connected to combinational outputs, a carry chain, and four
// D flip-flops (which are covered as MISTRAL_FF in mem_sim.v).
//
// The ALM is vertically symmetric, so I find it helps to think in terms of
// half-ALMs, as that's predominantly the unit that synth_intel_alm uses.
//
// ALMs are quite flexible, having multiple modes.
//

## testcase_00033.v
module expression_00033(a0, a1, a2, a3, a4, a5, b0, b1, b2, b3, b4, b5, y);
  input [3:0] a0;
  input [4:0] a1;
  input [5:0] a2;
  input signed [3:0] a3;
  input signed [4:0] a4;
  input signed [5:0] a5;

  input [3:0] b0;
  input [4:0] b1;

## chess.py
from enum import Enum
from nmigen import Elaboratable, Repl, Signal, Module
from nmigen.back import rtlil, verilog

NORTH_MASK = 0xFFFFFFFFFFFFFF00
EAST_MASK = 0xFEFEFEFEFEFEFEFE
SOUTH_MASK = 0x00FFFFFFFFFFFFFF
WEST_MASK = 0x7F7F7F7F7F7F7F7F

NORTH_EAST_MASK = NORTH_MASK & EAST_MASK

## pixel_pipeline.py
from nmigen import Elaboratable, Module, Signal
from nmigen.back import rtlil, verilog

from alpha_blend import AlphaBlend
from alpha_test import AlphaTest
from clamp import Clamp
from dest_alpha_test import DestinationAlphaTest
from dither import Dither
from z_test import ZTest

## foo.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Ravenslofty
                / foo.md
            
            
              Last active
              February 11, 2020 18:25
            
          
    This is kind of a long and rambling post, but I hope it makes some sense.
First off, a data point that caused this whole ramble. The following graph comes from 100 runs of Yosys and nextpnr on picorv32_large.v from yosys-tests, targeting an iCE40HX8K.

Obviously, ABC1 resulting in a higher average Fmax than ABC9 is concerning, given that the point of ABC9 is to improve synthesis quality compared to ABC1.
But this got me wondering: how do we actually quantify that a change produces a better maximum frequency?
	=== top ===

	Number of wires: 450
	Number of wire bits: 464
	Number of public wires: 450
	Number of public wire bits: 464
	Number of memories: 0
	Number of memory bits: 0
	Number of processes: 0
	Number of cells: 422
	#include <stdio.h>

	typedef signed long long I64;
	typedef unsigned long long U64;
	#define C64(constantU64) constantU64##ULL

	const U64 notAFile = 0xfefefefefefefefe;
	const U64 notABFile = 0xfcfcfcfcfcfcfcfc;
	const U64 notGHFile = 0x3f3f3f3f3f3f3f3f;
	const U64 notHFile = 0x7f7f7f7f7f7f7f7f;
	Old: synth_intel_alm

	=== fastfir_dynamictaps ===

	Number of wires: 35393
	Number of wire bits: 116525
	Number of public wires: 35393
	Number of public wire bits: 116525
	Number of memories: 0
	Number of memory bits: 0
	+------------------------------------------------------------------------------------------------------------------------+
	; Fitter Partition Statistics ;
	+-------------------------------------------------------------+-------------------------+--------------------------------+
	; Statistic ; Top ; hard_block:auto_generated_inst ;
	+-------------------------------------------------------------+-------------------------+--------------------------------+
	; Logic utilization (ALMs needed / total ALMs on device) ; 3123 / 113560 ( 3 % ) ; 0 / 113560 ( 0 % ) ;
	; ALMs needed [=A-B+C] ; 3123 ; 0 ;
	; [A] ALMs used in final placement [=a+b+c+d] ; 4153 / 113560 ( 4 % ) ; 0 / 113560 ( 0 % ) ;
	; [a] ALMs used for LUT logic an
	// The core logic primitive of the Cyclone V/10GX is the Adaptive Logic Module
	// (ALM). Each ALM is made up of an 8-input, 2-output look-up table, covered
	// in this file, connected to combinational outputs, a carry chain, and four
	// D flip-flops (which are covered as MISTRAL_FF in mem_sim.v).
	//
	// The ALM is vertically symmetric, so I find it helps to think in terms of
	// half-ALMs, as that's predominantly the unit that synth_intel_alm uses.
	//
	// ALMs are quite flexible, having multiple modes.
	//
	module expression_00033(a0, a1, a2, a3, a4, a5, b0, b1, b2, b3, b4, b5, y);
	input [3:0] a0;
	input [4:0] a1;
	input [5:0] a2;
	input signed [3:0] a3;
	input signed [4:0] a4;
	input signed [5:0] a5;

	input [3:0] b0;
	input [4:0] b1;
	from enum import Enum
	from nmigen import Elaboratable, Repl, Signal, Module
	from nmigen.back import rtlil, verilog

	NORTH_MASK = 0xFFFFFFFFFFFFFF00
	EAST_MASK = 0xFEFEFEFEFEFEFEFE
	SOUTH_MASK = 0x00FFFFFFFFFFFFFF
	WEST_MASK = 0x7F7F7F7F7F7F7F7F

	NORTH_EAST_MASK = NORTH_MASK & EAST_MASK
	from nmigen import Elaboratable, Module, Signal
	from nmigen.back import rtlil, verilog

	from alpha_blend import AlphaBlend
	from alpha_test import AlphaTest
	from clamp import Clamp
	from dest_alpha_test import DestinationAlphaTest
	from dither import Dither
	from z_test import ZTest