Skip to content

Instantly share code, notes, and snippets.

@leigh-johnson
Forked from Zhaoyilunnn/quantum_arch.md
Created October 21, 2024 06:46
Show Gist options
  • Save leigh-johnson/5adad030bab331680128830182704032 to your computer and use it in GitHub Desktop.
Save leigh-johnson/5adad030bab331680128830182704032 to your computer and use it in GitHub Desktop.
quantum_arch

Architecture

Papers

Chips

  • An Energy-Efficient Configurable Lattice Cryptography Processor for the Quantum-Secure Internet of Things. ISSCC-2019

  • A 28nm Bulk-CMOS 4-to-8GHz ¡2mW Cryogenic Pulse Modulator for Scalable Quantum Computing. ISSCC-2019

  • A Scalable Quantum Magnetometer in 65nm CMOS with Vector-Field Detection Capability. ISSCC-2019

  • A 48GHz 5.6mW Gate-Level-Pipelined Multiplier Using Single-Flux Quantum Logic. ISSCC-2019

  • 19.1 A Scalable Cryo-CMOS 2-to-20GHz Digitally Intensive Controller for 4×32 Frequency Multiplexed Spin Qubits/Transmons in 22nm FinFET Technology for Quantum Computers. ISSCC-2020

  • 19.2 A 110mK 295µW 28nm FDSOI CMOS Quantum Integrated Circuit with a 2.8GHz Excitation and nA Current Sensing of an On-Chip Double Quantum Dot. ISSCC-2020

  • 19.3 A 200dB FoM 4-to-5GHz Cryogenic Oscillator with an Automatic Common-Mode Resonance Calibration for Quantum Computing Applications. ISSCC-2020

  • Cryo-CMOS for Quantum Computing Technology Directions Subcommittee. ISSCC-2021

  • A Fully Integrated Cryo-CMOS SoC for Qubit Control in Quantum Computers Capable of State Manipulation, Readout and High-Speed Gate Pulsing of Spin Qubits in Intel 22nm FFL FinFET Technology. ISSCC-2021

  • A Fully-Integrated 40-nm 5-6.5 GHz Cryo-CMOS System-on-Chip with I/Q Receiver and Frequency Synthesizer for Scalable Multiplexed Readout of Quantum Dots. ISSCC-2021

  • 13.4 A 1GS/s 6-to-8b 0.5mW/Qubit Cryo-CMOS SAR ADC for Quantum Computing in 40nm CMOS. ISSCC-2021

  • Electronics for a Quantum World. ISSCC-2021

  • 26.2 Design Considerations for Superconducting Quantum Systems. ISSCC-2022

  • Beyond-Classical Computing Using Superconducting Quantum Processors. ISSCC-2022

  • A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems. ISSCC-2022

  • Inverse Designed, Densely Integrated Classical and Quantum Photonics. ISSCC-2023

  • A 28-nm Bulk-CMOS IC for Full Control of a Superconducting Quantum Processor Unit-Cell. ISSCC-2023

  • A Calibration-Free 12.8-16.5GHz Cryogenic CMOS VCO with 202dBc/Hz FoM for Classic-Quantum Interface. ISSCC-2023

  • A Scalable Cryo-CMOS Controller for the Wideband Frequency-Multiplexed Control of Spin Qubits and Transmons. Intel, JSSC, 2021.

  • A Fully Integrated Cryo-CMOS SoC for State Manipulation, Readout, and High-Speed Gate Pulsing of Spin Qubits. Intel, JSSC, 2021.

  • CMOS-based cryogenic control of silicon quantum circuits. Nature. 2021.

EDA

  • Qplacer: Frequency-Aware Component Placement for Superconducting Quantum Computers, arXiv, 2024 (Yiran Chen, Hai Li)
  • Fast Virtual Gate Extraction For Silicon Quantum Dot Devices. DAC. 2024.

Performance

  • Software tools for quantum control: improving quantum computer performance through noise and error suppression. QST, 2021.

Quantum Arch

  • Microarchitectures for Heterogeneous Superconducting Quantum Computers, MICRO-2023
  • Multi-mode Cavity Centric Architectures for Quantum Simulation, 2023
  • GRAPHINE: Enhanced Neutral Atom Quantum Computing using Application-Specific Rydberg Atom Arrangement, SC, 2023

Quantum-Classical Arch

  • Inter-temperature Bandwidth Reduction in Cryogenic QAOA Machines, CAL-2023
  • Hardware for multi-superconducting qubit control and readout*, BAQIS, CPB, 2021.

Review

Quantum Computer Architecture: Towards Full-Stack Quantum Accelerators, DATE-2020

A Heterogeneous Quantum Computer Architecture

A heterogeneous quantum computer architecture, CF-2016

QUANTUM SYSTEM STACK

  • A quantum computer will always consist of both quantum and conventional computing components.

FTQC

Readout

  • Scaling Qubit Readout with Hardware Efficient Machine Learning Architectures, ISCA-2023

Security

  • A Quantum Computer Trusted Execution Environment, CAL-2023

Circuit Architecture

Papers

  • Quantum circuit architecture search for variational quantum algorithms, NPJ Quantum
  • QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits, HPCA-2022
  • QuantumDARTS: Differentiable Quantum Architecture Search for Variational Quantum Algorithms, ICML-2023
  • TopGen: Topology-Aware Bottom-Up Generator for Variational Quantum Circuits, 2022
  • Quantum Generative Adversarial Networks for learning and loading random distributions, NPJ-Quantum, 2019
  • Approximate amplitude encoding in shallow parameterized quantum circuits and its application to financial market indicators, Physical Review Research, 2022
  • Efficient Data Encoding and Decoding for Quantum Computing, QCE-2022
  • Quantum State Preparation Circuit Optimization Exploiting Don't Cares. ICCAD. 2024

NPJ-Quantum 2021

https://github.com/yuxuan-du/Quantum_architecture_search/

Quantum circuit architecture search for variational quantum algorithms

  1. What is $\mathcal{L}$?
  2. Why still need to retrain selected ansatz?

image

image

TopGen

Why sub-circuits as building blocks

  • Reduce design space. (But no motivation data!)

Method

  1. Generate sub-circuits, i.e., directly use some subgraphs of the corresponding topology.
  2. Combine sub-circuits, but there's no inter-connection between sub-circuits.
  3. Add 2q gates between subcircuits.
  4. Add addition sub-circuits that are equivalent to identity matrix.

image

Results image

PRR-2022

Contributions

  • Based on NPJ-Quantum-2019, which is limited to the case where the sign of the data components does not matter. This work can encode the sign in addition to the absolute value.
  • Showcase AAE with qSVD and its application in finance.

Control Processor

Papers

  1. An experimental microarchitecture for a superconducting quantum processor, MICRO-2017
  2. XQsim: modeling cross-technology control processors for 10+K qubit quantum computers, ISCA-2022
  3. DigiQ: A Scalable Digital Controller for Quantum Computers Using SFQ Logic
  4. QubiC: An Open-Source FPGA-Based Control and Measurement System for Superconducting Quantum Information Processors, TQE-2021
  5. QIsim: Architecting 10+K Qubit QC Interfaces Toward Quantum Supremacy, ISCA-2023

Background

Refs

  1. 量子软件前沿:量子控制体系结构1
  2. 量子软件前沿:量子控制体系结构2

(Superconducting)

  • 通过模拟波形控制:well defined electronic signals with precise envelope, frequency, phase, timing。

    • 例如:$X$ gate 可以通过调制到对应qubit频率的特定相位的持续时间20ns的高斯脉冲实现。
    • Single-qubit gates
      • 20 ns Microwave signals
    • Two-qubit gates
      • 40 ns Flux pulse
    • Measurement
      • 0.2~0.3 us Microwave transmission through the feed line
  • 软件实现Measurement Discrimination: 测量得到模拟波形信号,采样量化后通过如下方式判断

    • $$ S_q=\int V_a(t) W_q(t) d t, \text { and } M_q= \begin{cases}1 & \text { if } S_q>T_q \ 0 & \text { otherwise }\end{cases} $$
  • 实现方法:通过AWG实现。

    • 存在问题:
      1. 量子操作(gate)点组合非常多,每次更改都需要手动重新上传波形。随着qubit增多和算法更加复杂,这种方法无法扩展。
      2. 无法支持动态决定的操作(Classical feedback)。

image

image

Dilution refrigerator (稀释制冷机)

image

image

image

Control Electronics

  • FPGA-based electronic system for the control and readout of superconducting quantum processors, 2022, Alibaba, Review of Scientific Instruments.

Workflow

脉冲控制信号的生成和传递给量子比特的过程可以分为以下几个关键步骤:

  1. 生成控制脉冲序列:在这个系统中,首先利用任意波形发生器(AWG)来生成控制脉冲序列。这些脉冲序列是量子计算中的基本控制信号,用于执行各种量子操作。

  2. 脉冲上变频:生成的控制脉冲序列接着通过一个混频器上变频至量子比特的操作频率。这一步是必要的,因为量子比特通常在微波频段进行操作,而AWG生成的脉冲序列频率较低。

  3. 通过微波同轴电缆传输:上变频后的脉冲通过微波同轴电缆传输至量子比特所在的低温环境(冷却器)中。这些脉冲是驱动量子比特进行量子操作的关键。

  4. 量子比特的操作和测量:量子比特通过与这些微波脉冲的相互作用进行各种量子操作。对于量子比特的状态测量,系统通过采样和解码与量子比特耦合的超导谐振器相互作用的测量脉冲来获取量子比特的状态。测量脉冲也由AWG生成并通过混频器上变频。

在整个过程中,FPGA的实时数字信号处理系统为精确的时序控制、任意波形生成以及量子比特状态的判别提供了支持。此外,低延迟设计确保了在量子比特的相干时间内完成反馈/前馈控制,这对于保持量子比特的稳定性和有效的量子错误纠正(QEC)至关重要。

Trigger机制

在FPGA-based电子系统中,利用触发(trigger)信号实现同步的机制主要涉及以下几个方面:

  1. 多级触发架构: 系统采用了一种多级触发架构来实现时间上的精确同步。在这个架构中,任务控制模块(TCM)将任务分解为一系列实时任务序列,并向每个模块发送触发序列。这些触发序列用于指导任意波形发生器(AWG)和数据采集(DAQ)模块产生不同级别的二级触发序列,根据系统触发和配置信息实现控制和测量。

  2. 触发信号和数据传输: 二级触发信号用于控制数字至模拟转换器(DAC)和模拟至数字转换器(ADC)产生门控序列波形,以及采样探测信号。这些探测信号携带有关量子比特状态的信息,实现了对量子比特的控制和测量。这个过程中,触发信号的精确同步对于确保数据传输的正确性和时效性至关重要。

  3. 低延迟设计: 为了实现量子错误纠正(QEC),系统设计强调了低延迟的重要性。在这种情况下,触发信号用于实时检测和纠正物理错误。低延迟意味着在等待时间内由于量子退相干而可能发生的错误较少,因此可以提高QEC的效率。

  4. FPGA固件和存储优化: 在FPGA固件中,波形生成模块是主要的延迟贡献者。当触发信号到达时,预存储在内存(如块随机存取存储器 - BRAM)中的波形数据被加载到DAC。系统通过优化存储空间使用和触发系统设计,使得可以更有效地生成所需的长波形序列。例如,一个长的波形代表一个门序列,可以被分解成在给定延迟时间开始的脉冲,从而只需要存储有限组中门的数量和它们出现的起始时间。

通过这些同步机制,系统能够实现精确的时序控制和高效的数据传输,这对于量子计算中的精确操作至关重要。

Hardware of IOP and BAQIS

Basis

  • To control the qubit state, microwave pulses with appropriate frequency, amplitude and phase are needed.
  • Detecting the qubit state is realized by observing the shift in the resonant frequency of a readout resonator interacting with the qubit.
  • The electronic hardware for one readout channel includes a microwave source, two AWGs and a data acquisition board.
  • Usually, one readout transmission line can be used for detecting about ten qubits.
  • One readout channel includes: one microwave source, two AWGs, one data acquisition board.

Design Considerations

  • Scalability: Easy to insert new module within chasis; Able to connect different chasis for controling more qubits.
  • Synchronization:
    • High synchronization between different control channels is required to ensure accurate manipulation of qubit quantum state so as to reduce error accumulation.
    • Intra-Chasis: Different module boards are controlled by the same clock board.
    • Inter-Chasis: Accept external reference clock and trigger signals to ensure sync between chasis.
  • Latency:
    • Data transmission time between master and slave computer
      • DMA between master and slave.
      • Perform data processing on local FPGA.
    • Board control time ⇒ Async processing (throughput not latency)
    • Feedback control ⇒ integrate AWG and ADC on one board, controlled by one FPGA.

image

Hardware Detail

image

  • Readout Board
    • Readout the quantum state of qubits
    • Consists of two analog-to-digital converters (ADCs) for data acquisition and two arbitrary waveform generators (AWGs) for modulation of readout microwave signals
    • The FPGA on the board can simultaneously control the ADCs and AWGs to achieve fast quantum feedback control
  • Control Board
    • Functions as an AWG board
    • Each board has 6 digital-to-analog converter (DAC) chips that can generate 6 independent AWG channels
    • The AWG signals are used to control qubits by modulating amplitude, frequency and phase of microwave signals via IQ mixers
  • Bias Board
    • Provides 16 channels of DC voltage output with 20-bit resolution and ±10V range
    • Used to supply DC bias to change qubit working frequency to find suitable operation and readout points
    • Can also control microwave switches to reduce unwanted microwave leakage to qubits

Horse Ridge

CMOS-based cryogenic control of silicon quantum circuits. Delft QuTech.

  • For superconducting quantum computing, root temperature controller is not scalable.
  • However, it works at 20 mk, the power dissipation of the control electronics easily surpasses the typical cooling power of 10 μW available at 20 mK.
  • In contrast, silicon spin qubits can be operated and measured above 1 K, they are well positioned for overcoming the wiring bottleneck by on-die or on-package co-integration with classical electronics.

QuMA

MICRO-2017

  • What is event driven simulation
  • What output can current arch simulator provide?
    • Gem5: execution time
  • What output do we expect from a quantum system simulator
    • Execution time? Fidelity?
  • Should we target NISQ + VQA?

What is the problem

  • Quantum compiler typically generate instructions that are not directly executable on a quantum processor
  • Need to design a controller to transform instructions to executable operations on QPU.

QuMA

Implemented using FPGA

image

Codeword-Based Event Control

Codeword-Triggered Pulse Generation.

  • Pulses for a fixed and small set of quantum operations can be well defined and used after calibration

image

Measurement Discrimination.

  • Hardware- based measurement discrimination units in the analog-digital in- terface.

Queue-Based Event Timing Control

image

初始状态如下表。一共有四个queue,通过event label关联

  1. Timing Queue: 记录时间 (Time, Time Label)
  2. Pulse queue: 记录脉冲类型 (Pulse, # Qubit)
  3. MPG:Measurement Pulse Generation (Time Label)
  4. MD: Measurement discrimination. (Wirte Back Register, Time Label)

image

4000 cycle 之后,事件1被触发并广播到各个queue上,Timing queue和pulse queue的事件标签匹配,因此可以执行。执行之后变成如下状态。

image

以此类推

image

Multilevel Instruction Decoding

quantum gates $\to$ quantum microinstructions (QuMIS) $\to$ micro-operations

image

QuMIS

image

image

Q: Why QuMIS to Seq?

For each predefined micro-operation $uOp_i$, the micro-operation unit store a sequence $\text{Seq}_i$ comprising of codewords and timing. $$ \text{Seq}_i: ([0, cw_0]; [\Delta{t_1}, cw_0]; [\Delta{t_2}, cw_0]; [\Delta{t_3}, cw_0]; \dots) $$ Example: $Z=X\cdot Y$ ==> $\text{Seq}_Z: ([0, 1]; [4, 4])$.

eQASM

eQASM: An Executable Quantum Instruction Set Architecture, HPCA-2019

Challenge

  • Limited executability of existing low-level quantum assembly languages
Comprehensive Abstraction Challenge
  1. Using digital waveforms only as the interface to quantum hardware
  2. QuMIS: Instruction-based waveform generation (Short pulses instead of long waveforms)
  3. QuMIS limitations: timing of executing instructions is decoupled from that of pulse generation by a set of FIFOs. As a consequence, a classical instruction that uses a qubit measurement result may start execution before the expected result is ready, or read another result when there are multiple instructions measuring the same qubit, which can lead to a wrong execution result. Hence, it is a challenge to design a mechanism that can correctly implement runtime feedback to support comprehensive quantum program flow control.
Scalability & Flexibility Challenges
  1. When $R_{\text{req}} > R_{\text{allowed}}$, the microarchitecture cannot execute the quantum program correctly.
  2. QuMIS limitations as follows and how to design a QISA with a high instruction information density forms a scalability challenge.
    • An explicit waiting instruction is required to separate any two consecutive timing points.
    • Each target qubit of a quantum operation occupies a field in the instruction, making the instruction width a limitation for the number of target qubits in a single instruction;
    • Two parallel and different operations cannot be combined into a single instruction.
  3. Though being able to provide flexibility in directing the program flow control at runtime, some QISA design suffers from no quantum semantics. For example, instructions of QuMIS and the Raytheon BBN APS2 instruction set are low level and tightly bound to the electronic hardware implementation to ensure the executability. ==> Executable QISA?

QuAPE

Exploiting Different Levels of Parallelism in the Quantum Control Microarchitecture for Superconducting Qubits, Tencent, MICRO-2021

https://quantum.tencent.com/news/2021/0902_1109

Motivation

  1. 没有依赖关系的子电路可以并行执行。

Methodology

为了衡量控制处理器实现量子操作层级并行度的性能,我们定义一个CES(Cycles Each Step)指标,即在控制处理器中执行电路中某一步(circuit step: 量子电路中的一部分,包含了某一时间点上所有的并行量子操作)所需花费的周期数。我们将CES理解为由四部分组成:执行量子指令的周期数(CEQI: Cycles Each Quantum Instruction,执行每个量子指令所花的周期数;QICES: Quantum Instruction Count Each Step,每一步中量子指令的个数),执行经典指令的周期数,经典控制流程带来的暂停,以及反馈控制在控制处理器上所花的时间。

image

红色虚线框控制多个处理单元的模块

DigiQ

Rapid single flux quantum, wikipedia

DigiQ: A Scalable Digital Controller for Quantum Computers Using SFQ Logic, HPCA-2022

What is the problem

image

(a) Sending separate analog microwave control pulses for each qubit through coaxial cables

  • Massive costs of generating/routing the analog microwave signals
  • Heat dissipation at millikelvin temperatures due to using a large number of high bandwidth coaxial cables

It is essential to build compact controllers as close as possible to quantum chips in order to generate and route the control signals locally and address the scalability problem

Technologies

  1. Cryo-CMOS
  2. Superconducting Single Flux Quantum (SFQ): a promising solution to maximize the scalabil- ity of in-fridge controllers due to its unique characteristics such as ultra-high speed and very low power

What they did

  1. Design space exploration of SFQ-based controllers
  2. Co-design the quantum gate decompositions and SFQ-pulse implementation of those decompositions
    • Works within the tight power and area budget of dilution refrigerators at large scales
    • Provides good algorithmic performance.

Insights

  1. Lack of dense memory/logic in SFQ ==> cannot afford to simultaneously send tailored quantum gates to many qubits ==> requires sharing SFQ-based quantum instructions among multiple qubits (SIMD)

Details

Overview

image

SFQ-based universal quantum computation

  1. Design Options

image

  • SFQ_MIMD_naive: Allocate separate SFQ registers for SFQ bitstreams per qubit (): expensive in terms of power and area
  • SFQ_MIMD_decomp: Store a universal single-qubit gate set per qubit on the SFQ chip
  • SFQ_SIMD_decomp: Share SFQ bitstreams among a group of qubits
  1. Single-qubit gate decomposition
  • Lack of dense memory in SFQ
  • Need to keep the SFQ logic simple

==> Storing/processing a limited number of SFQ bitstreams

  1. Two-qubit gate design
  • Implemented CZ gate.

您提到的是一个用于量子计算的电流控制模型,具体来说是针对一种叫做超导量子比特的设备的。在这种设备中,我们使用SFQ/DC块(单磁通量量子直流块)来控制电流,并通过这种方式影响特定量子比特的状态。

以下是针对您提供的信息的一些解释:

  1. 启动和停止信号:当SFQ/DC块接收到启动信号时,它们开始输出电流,并继续这样做,直到接收到停止信号。也就是说,CZ门(受控Z门)的开始和结束分别对应于打开(关闭)SFQ/DC块。

  2. 目标特定量子比特对:要在多量子比特系统中针对特定量子比特对,我们需要同时将电流波形应用于这两个量子比特。这就意味着我们需要为每个量子比特配置一个电流生成器。

  3. 内部电流生成:您所提到的这种方法与之前的磁通可调CZ门的实现的唯一区别在于,电流是在冰箱(在此上下文中,指低温环境)内部生成的。这可能是为了更有效地控制和维持低温环境,这在超导量子计算中是至关重要的。

  4. DigiQ Min 和 DigiQ Opt:这两种都是某种类型的超导量子比特系统的名称。在这两种系统中,都使用了相同的两量子比特门设计。

需要注意的是,这是一个非常专业的领域,可能需要对量子物理,超导技术,和低温工程等领域有深入的理解。我尽力提供了一个精简易懂的解释,但如果您需要更深入的信息,可能需要寻找具有相关专业知识的专家。


这段文字描述了一个涉及超导量子计算的控制系统的工作方式。在这个系统中,从常温环境(即室温环境)发送的控制位对冷却下来的超导量子比特进行操作。这个系统的工作方式具有一定的复杂性,我会尽量解释清楚其中的关键概念。

  1. 控制数据:BS sel、1q sel和2q sel是从室温环境在每个控制器周期发送的控制位,它们通过Ctrl. data cables进行传输。这些控制位可以选择和控制特定的量子比特和两量子比特门。

  2. 有效性验证:使用Valid cable来确定数据线上的控制数据的有效性。这是一种错误检测机制,确保发送的数据是正确的。

  3. 加载SFQ比特流:Load cable用于通过数据线加载SFQ比特流。这个过程在程序执行之外(即离线)完成。每个SFQ比特流的位数不超过300位,实际的比特流长度取决于目标门和系统哈密顿量。

  4. 控制器时钟启动:在完成第一个控制器周期的控制位传输后,从室温环境发送一个Go信号来启动控制器时钟。该时钟使用一个计数器实现,每个SFQ芯片周期计数器都会增加,每个控制器周期都会重置。

  5. 缓冲和传输控制位:在每个控制器周期的开始,已经在一个缓冲器中(在Fig. 5中为Buffer#1)缓冲的控制位被传输到第二个缓冲器(在Fig. 5中为Buffer#2),以供量子比特控制器和SFQ比特流生成器使用。在执行当前控制器周期时,下一个控制器周期的控制位在第一个缓冲器中缓冲。

这是一个相当复杂的过程,需要深入理解量子计算和相关硬件的知识才能完全理解。我希望这个简化的解释对您有所帮助。


Calibration

Calibrating single-qubit gates

  1. Find SFQ bitstreams implementing a desired set of basis gates with high fidelity on qubits with no frequency variation.
  2. Characterize each qubit’s actual oscillation frequency using experimental measurements.
  3. Use the learned bitstreams and measured frequencies to determine the actual basis operations implemented on each qubit by the shared bitstreams.
  4. Compile quantum circuits using the actual single-qubit basis operations determined for each qubit.

XQsim

XQsim: modeling cross-technology control processors for 10+K qubit quantum computers, ISCA-2022

Video

Background

image

image

Surface-code patch: two types of physical qubits (1) data qubits containing state information (2) ancilla qubits used for extracting the error information. Patch’s code distance in the example is 3, determined by the target error rate.

The problem

  • Two critical challenges for designing a large-scale control system
    • Interface between classical hardware and qubits.
      • today’s superconducting quantum computers utilize per-qubit coaxial cables to send microwave pulses from room-temperature QC interface to qubits located in a dilution refrigerator. (Cannot due to space limitations and huge thermal loads)
    • Future fault-tolerant QC has other scaling challenges orthogonal to QC interface
      • Increasing demand for QEC (Quantum Error Correction)
      • Limited cooling capacity of dilution refrigerators
  • Absence of a tool to develop quantum control architecture

Overview

image

Full microarchitecture implementation

QISA

image

Microarchitecture

image

  • Quantum instruction decoder (QID): Instruction decomposition

image

  • Patch decode unit (PDU): Logical qubits $\to$ patch list (encode?)
  • Patch information unit (PIU): Dynamically updates surface-code patches’ information and forwards the information to other hardware units
  • Physical schedule unit (PSU): Schedules corresponding physical-qubit level instructions (i.e., codeword) to the target physical qubits
  • Time control unit (TCU): Takes the codeword array from PSU and sends it to QC interface at the accurate timing.
  • Error decode unit (EDU): Identifies the types and locations of errors that occurred during the ESM.
  • Pauli frame unit (PFU): Tracks the errors with the Pauli frame (PF) registers and provides them to LMU for the virtual error correction
  • Logical measure unit (LMU): derives the logical-qubit level measurements based on the physical qubit measurements

XQSIM

  • XQ-estimator: modeling control processors built with different logic families (RSFQ/CMOS)
  • XQ-simulator
    • Cycle-accurate architecture simulation
    • Output (1) 300K-4K data transfer rate (2) 4K power consumption (3) Instruction bandwidth (4) Error decoding latency

image

Insights

image

  1. Move only PSU and TCU to 4K. (1) PSU and TCU dominate inter-unit data transfer (98.1%) (2) Other hardware units (EDU, PFU) consume much more power

image

  1. Move EDU to 4K (ERSFQ) (With EDU power optimization)

image

Q3DE

Q3DE: A fault-tolerant quantum computer architecture for multi-bit burst errors by cosmic rays - MICRO 2022

IBM - ISSCC - 2022

Challenge

flexibility and cost effectiveness as the qubit count increases.

Solution

high-fidelity quantum operations with direct RF control and intentional aliasing of signals into higher Nyquist zones

Challenge of direct RF

image signal management. ==> A limiting factor for further scaling of direct RF hardware is the size and cost of these filtering circuits to meet signaling requirements.

IBM Control System

  • A central hub oversees execution of the experiment and coordinates real time results to individual qubit controller nodes for dynamic decision making.
  • Centralized control is an attractive model to enable dynamic circuit execution; however, this approach has scaling challenges due to instruction issue rate and data bandwidth to each qubit controller
  • uses a hybrid model with a central hub to manage overall execution and distributed qubit controller nodes.
  • require that all controllers stay precisely synchronized to guarantee consistent execution paths at each node.

What's missing?

=> operation sequence <--> qubit state simulator

  1. Flexible, easy-to-use control architecture simulator for NISQ. (Integration with gem5?)

  2. Quantum control architecture (QCA) for multi-level quantum chip.

  3. QCA design automation?

  4. VQA architecture, move classical optimization part to control processor?

    • VQA的classical部分是怎么计算的,是否存在大向量存储的问题。
    • 在实际量子计算机上执行的时候,如何得到每次迭代的测量结果。
  5. QCA for quantum chip-let (intra quantum communication) / multi processor (intra classical communication)

    • chiplet / 模块化量子芯片,跨芯片的数据传输是怎么控制的。
    • Each chip has its own controller, what about having a master controller?

Roadmap

  1. How to analyze current QCA's performance on VQA

EDA

This section includes so-called EDA in two folds

  1. physical quantum chip design automation
  2. logical quantum circuit design automation

Notes

  1. Design automation for quantum architectures Reversible quantum circuit can easily be tested. Formal verification techniques can be applied to the problem of ascertaining the correctness of either a given network or a compiler implementation.

Questions

  1. Language: from high level description to circuit. What about from circuit to pulse sequences.
  2. Considering all the physical constraints, what would be the fidelity of computation results? Could we simulate it??

Layout Synthesis

Conventional Synthesis: Placement and routing to produce the physical layout of a circuit.

Quantum Synthesis: From QASM to "circuit", program to program, more like compilation.

Question: What will quantum circuit synthesis (the physical entity) be like?

References

  1. Design automation for quantum architectures, DATE-2017

TODO

  • revs
  • Conventional EDA toolchain
  • Current status of quantum EDA toolchain
  • Future Quantum EDA, for whom to use? How to prototype the impact of such a toolchain
  • Can simulation environment really simulate real quantum computers, i.e., the errors, coherence time budgets, .etc?
  • Is it possible to utilize LLM to design quantum circuits, e.g., encoding circuit for ML?
    • 如何搜索一个最优的编码线路块的结构,深度最小?

MNQC (Multi-Node QC)

Ideas

  • 控制架构
  • 编译器

Scaling Superconducting Quantum Computers with Chiplet Architectures

link

  • Data source: SOTA transmon and Flip-chip Technology

Glossary

yield: The percentage of microchips that function correctly in a wafer.

What is the bottleneck?

  • Cross-Resonance (CR) Error:
    • Qubit-qubit connected by resonator is drived by microwave signal to perform multi-qubit operations.
    • Affected by system noise, irregular hardware inconsistency
    • Frequency collision is the dominant source.

image

  • Ideal Frequency Assignment
    • Current design: Heavy-hex lattice where $f_0 &lt; f_1 &lt; f_2$
    • Each qubit’s frequency must be distinguishable from its neighbor
      • Frequency difference too small: impact selective addressability
      • Too large: impact entanglement quality.
    • Next nearest neighbor also matters: If $f_j \approx f_k$, also cause collision.

image

  • Fabrication Variation: Deviate a qubit's frequency from its ideal
    • Stochastic: Each chip has unique frequency profile
    • Standard deviation $\sigma_{f}$ (regarding the target frequency) can be high

What is current effort

  • Post fabrication laser tuning (ref enables $\sigma_{f} \approx 0.014 \text{GHz}$ and improves $&lt;100$ qubit QC yields by $15\times$.
  • Further improvements are required to maintain yield while increasing the number of qubits manufactured on a monolithic QC.

The underlying technology

image

What is the problem and Why is the problem important

  • Increased Infidelity: Larger devices demonstrate higher two-qubit gate error rates and larger error rate distributions

image

  • Decreased Yield: Greater variation (fabrication variation, qubit heterogeneity) exists as QCs increase in size. Larger device contains more qubit-qubit frequency detunings that push them within or close to frequecy collision regions.

The chiplet architecture

image

  • Methodology is too easy to explain....

Evaluation

Yield

image

Fidelity

  • The average infidelity averaged across every qubit pair, $E_{avg}$

image

Questions

  1. What is collision-free yield: The probability that an circuit is executed without frequency-collision?: No
  2. Inter-chip gate has lower fidelity, why is the overall yield increased?
  3. The evaluation is limited: what is the influence of compilation? Will it introduce more overhead? In other words, what is the influence of execution speed?
  4. Based on 3, can we evaluate it based on the noise model of chiplet architecture.

Architectures for Multinode Superconducting Quantum Computers

http://arxiv.org/abs/2212.06167

Intro

Motivations

  • Complexities for building large device
  • Limitations set by the individual capacity of cryogenic dilution refrigerator.

The physical implementation considered

  1. Cryogenic links (small MNQCs)
  2. Room-temperature microwave-to-optical (M2O) quantum internode links (Large MNQCs)
  3. Three types of operations considered (Local computation, internode, circuit-cutting)

Limitations of M2O MNQCs

Noisier and slower due to: Weakness of the nonlinear conversion process, fiber-to-chip coupling, thermal added noise and other hardware difficulties.

What's required

Guide the development of M2O MNQCs. Need to

  1. Quantify the available performances
  2. Determine how the performances affect algorithm execution performance
  3. Determine how hardware and software should jointly navigate design space tradeoffs.

Research Targets

  1. Compiler improvements
    • Internode operations are likely to remain more expensive and error-prone than local quantum gates, and therefore minimizing the communication overhead incurred during compilation will remain a primary concern.

AutoComm

AutoComm: A Framework for Enabling Efficient Communication in Distributed Quantum Programs, MICRO-2022

Summary

  • First work of compiler to reduce communication overhead
  • More complex operations/quantum information transferred per EPR pair, less communication cost.
  • Leverage burst communication: continuous operations between one qubit and one node??? why can operate simultaneously??

Method

  1. Aggregates remote two-qubit gates by gate commutation
  2. Assigns an optimal communication scheme for each burst communication
  3. Perform a block level schedule.

Evaluation

Metric

  1. The number of issued remote communications
  2. The maximum number of remote two-qubit gates executed through one communication
  3. Relative performance (1) Communication reduction (2) Latency reduction.

Chiplet Physical Realization

  • Entanglement across separate silicon dies in a modular superconducting qubit device. NPJ. Quantum. 2021

image

Inter-chip entanglement

  • Inter-chip coupler (capacitive coupling)

MECH

  • MECH: Multi-Entry Communication Highway for Superconducting Quantum Chiplets. ASPLOS. 2024

image

Highway?

Core Idea

  • $C_{2_134}$, $C_{6_578}$ operate concurrently on two segments of GHZ states. Same for $C_{4_1378}$.
  • Other gates execute individually by routing qubits together using SWAPs.

Memory

Papers

  • Systems Architecture for Quantum Random Access Memory - Yongshan Ding 2023
  • Virtualized Logical Qubits: A 2.5D Architecture for Error-Corrected Quantum Computing, Fred Chong
  • On the robustness of bucket brigade quantum RAM, NJP, 2015

Basic

Refs

Papers

Posts

Codes

Key Concepts $$ \text{Input: } i \xrightarrow{\text{Classical RAM}} \text{Output: } x_i \ \sum_{i=0}^{N-1}\alpha_i |i\rangle_A |0\rangle_B \xrightarrow{\text{Quantum RAM}} \sum_{i=0}^{N-1} \alpha_i|i\rangle_A |x_i\rangle_B $$

Where $|\cdot\rangle_A$ ($|\cdot\rangle_B$) is the address (bus) qubit register storing the input (output).

Conventional RAM

image In the above figure

  • $N$ memory cells are placed at the end of a bifurcation graph.
  • For the $j$ th bit of the address register, if it is $0$, the left path is followed, if it is $1$, the right path is followed.
  • Each of the $N$ possible values of the address register thus indicates a unique route that crosses the whole graph and reaches one of the memory cells.
  • Each address bit controls all the transistors in one of the graph levels: it activates all the transistors in the left paths if it has value 0, or all the transistors in the right paths if it has value 1

The $d$-dimensional RAM consists of $d$ such graphs, each addressing one side of a $N^{1/d}\times N^{1/d} \times \cdots \times N^{1/d}$ array.

Quantum RAM

To query a superposition of memory cells, the address qubits are in general entangled with $O(N)$ switches or quantum gates. (or, equivalently, they must control two-body interactions over exponentially large regions of space), i.e., a state of the form $\sum_{j}\psi_j |j_0j_1\cdots j_{n-1}\rangle_a \otimes |j_0\rangle_{s_0}|j_1\rangle_{s_1}^{\otimes 2} \cdots |j_{n-1}\rangle_{s_{n-1}}^{\otimes 2^{n-1}}$


The challenge:

  • Replace all address bits to qubits.
  • Replace all switch to qubits.
  • Then each address qubit need to control $2^k$ switch qubits through CNOT, resulting in a gigantic entanglement.

Quantum random access memory, PRL-2008

Here are the key ideas and equations from the paper:

  • A random access memory (RAM) uses $n$ bits to address $2^n$ memory cells. A quantum RAM (qRAM) uses $n$ qubits to address a superposition of memory cells.

  • Conventional RAM designs require $O(N)$ switches to be thrown to access one of $N$ memory cells. This requires a lot of energy and leads to high error rates in qRAMs.

  • The proposed "bucket brigade" qRAM architecture only requires $O(log N)$ switches to be thrown. This reduces the energy usage and error rate exponentially.

  • In the bucket brigade architecture, each node of the bifurcation graph that makes up the memory has a "qutrit" - a 3-level quantum system in the states |wait〉, |left〉, and |right〉.

  • The address register and bus are made up of qubits. When a qubit reaches a node, if the qutrit is in |wait〉 it is swapped into |left〉 or |right〉. Otherwise, the qutrit routes the qubit left or right accordingly.

  • After the address register has passed through, a bus qubit follows the path and interacts with the memory cells. It is then passed back out, and the qutrits are reset to |wait〉.

  • This architecture entanglements only O(log N) qutrits, leading to much lower error rates. The fidelity is O(1 - ԑlogN) even if a fraction ԑ of gates are decohered.

  • A proof-of-principle implementation could use trapped atoms or ions for the qutrits and photons for the qubits. Raman pulses could be used to control the timing.

  • While the bucket brigade could significantly improve qRAMs, for standard RAMs the main sources of energy usage are in the memory cells themselves, not the access procedure. So the benefits may be more limited.

Study Question Answer
What is the key idea of the bucket brigade qRAM architecture? It reduces the number of switches/gates that need to be actively involved in a memory access from O(N) to O(log N). This reduces energy usage and error rates exponentially.
How does the bucket brigade work? Each node has a 3-level "qutrit" in the state
How could this be implemented? Using trapped atoms/ions for the qutrits and photons for the qubits. Raman pulses could control the timing.
What are the main benefits of the bucket brigade? Exponentially lower energy usage and error rates. Entanglement of only O(log N) systems so much lower decoherence.
What are the limitations? For standard RAMs, most energy usage is in the memory cells themselves, not access. So benefits may be limited.

Why do we need $|wait\rangle$ state?

The main reason for introducing the wait state in the qutrits for Bucket Brigade QRAM is to avoid activating unaccessed memory paths by the address qubits. Let me explain in more detail:

  1. The qutrits in QRAM act as quantum switches to route signals based on the state of the address qubits.

  2. If the qutrits only had |0〉 and |1〉 states, each address qubit could fully determine the state of the qutrit.

  3. This would lead to all qutrits along an accessed address path being activated to definite |0〉 or |1〉 states.

  4. Thus, unaccessed memory paths would also get activated, causing extra errors and resource consumption.

  5. To prevent this, the wait state is introduced as the initial state of the qutrits.

  6. Only when a qutrit is in the wait state, can an address qubit change its state to an activated |0〉 or |1〉 state.

  7. Qutrits already activated will not change state again for subsequent address qubits.

  8. This ensures only qutrits on the accessed path are activated at each time step, avoiding unaccessed paths.

  9. After access, qutrits can be reset to the wait state using lasers, ready for the next round of activation.

  10. This mechanism enables precise addressing of quantum memory, reducing resource costs and improving efficiency.

So in summary, the wait state is an important design in Bucket Brigade QRAM to minimize interference on unaccessed paths, enhancing its noise resilience, which is a key advantage over other QRAM architectures.

Bucket-brigade

A bucket brigade or human chain is a method for transporting items where items are passed from one (relatively stationary) person to the next.

image

Bucket-brigade architecture

image

  • A trit in the level wait will change its value according to the value of any incoming bit: if the incoming bit is 0, it takes the value left, while if the incoming bit is 1, it takes the value right.
  • A trit in the level left or right will deviate any incoming signal along the graph according to its value.
  • After all the $log(N)$ bits of the address register have passed through the graph, a single route of $n = log(N)$ left-right trit states has been carved through the graph.
  • In addition, every time the bus signal on its way back encounters a trit, the trit is reset to the wait state.

Mathematically, and, in the quantum realm

  • Trits replaced by qutrits.
  • When the qubits of the address register are sent through the graph, at each node they encounter a unitary encoding transformation $U$.
  • If the qutrit is initially in the $|wait\rangle$ state, the unitary swaps the state of the qubit in the two $|left\rangle - |right\rangle$ levels of the qutrits. (i.e., $U|0\rangle|wait\rangle = |f\rangle|left\rangle$, and $U|1\rangle|wait\rangle = |f\rangle|right\rangle$.)
  • If the qutrit is not in the $|wait\rangle$ state, then it simply routes the incoming qubit according to its state.
  • Once all the register qubits are sent through the graph, a bus qubit is injected and it reaches the end of the graph along the requested superposition of paths. It then interacts with the memory cells at such locations changing its state according to their information content.

Quantum Random Access Memory For Dummies

Here is a simplified summary of the key points from the Quantum Random Access Memory for Dummies paper:

  • QRAM allows direct access and manipulation of quantum states, enabling faster data storage/retrieval in quantum systems.

  • Unlike classical RAM (bits), QRAM uses qubits in superposition to represent data.

  • Main components of QRAM: input (address) register, output (data) register, memory array (quantum or classical).

  • QRAM accesses memory in superposition, querying multiple locations simultaneously.

  • Enables exponential speedups for algorithms like search, Fourier transform, quantum simulation.

  • Bucket brigade QRAM routes qubits through a graph, accessing memory with only O(log N) gate interactions. More resilient to noise.

  • Fanout QRAM has qubits controlling exponential number of switches, like classical RAM.

  • Flip-flop QRAM stores data sequentially using quantum circuit. Linear width but exponential depth.

  • Qudit QRAM uses qudits (d>2 states) for higher density but qudits are less stable.

  • PQC QRAM trains a parametric circuit to load/retrieve data. Constant depth, but approximate.

  • Implementations proposed with photons, atoms, superconducting and trapped ion qubits/qudits.

  • Main challenges are scalability, noise resilience, no-cloning issues, qudit instability.

  • Recent progress made in analyzing hardware needs, parallelizing bucket brigade, etc. But scaling remains difficult.

  • QRAM a promising technology but requires overcoming engineering challenges as quantum hardware matures.

Multi-level

Papers

Qutrits

Asymptotic improvements to quantum circuits via qutrits, ISCA-2019

Key Insights

  • Using the third state as temporary storage (higher gate error rate)
  • Only applies qutrit operations in an intermediary stage: the input and output are still qubits

Background

Conventional implementation of AND: Toffoli gate

image

image

What they do?

From conventional implementation $O(N)$ depth

The Toffoli gate can be constructed from single qubit T- and Hadamard-gates, and a minimum of six CNOTs.

image

To $O(log{N})$ depth $\to$ Reduce circuit depth so as to increate runtime and increase reliability.

image

Physical Implementation

Scalable algorithm simplification using quantum AND logic

image

  • iSWAP only change $|11\rangle \to |20\rangle$ or vice versa. For other input states, it will remain the same.

image

  • For this specific coupling structure, is the SOTA compiler good enough?

Implementation of a Toffoli gate with superconducting circuits

Parallel

EQC

name code
EQC: Ensembled Quantum Computing for Variational Quantum Algorithms, ISCA-2022 https://github.com/pnnl/eqc

Key Idea.

  • Dynamically distributes quantum tasks asynchronously across a set of physical devices

Pros

  • Reduced machine-dependent noise
  • Speedups for VQA training

Key technique: the weighting strategy:

  1. They propose a formula to calculate the correct probability $P_{Correct}$ for each quantum processor, considering factors like circuit depth, number of gate operations, T1/T2 decay times.

  2. On each client node, $P_{Correct}$ is computed before running the quantum circuit, then linearly normalized to get a weight w, possibly ranging 0.5-1.5.

  3. The weight $w$ is sent to the master node along with the gradient computed by the client.

  4. The master node updates the parameters using the weighted gradient: $θ_{t+1} = θ_{t} - wαg_t(θ_t)$. Higher $w$ means higher credibility of the quantum processor, and its gradient contribution is larger.

  5. This allows dynamically adjusting the contributions of each quantum processor based on its condition, reducing the influence of noisy processors and improving training accuracy.

  6. Experiments show proper weight ranges (e.g. 0.5-1.5) can make the training converge faster and get close to the best quantum processor's accuracy.

In summary, the weighting mechanism provides a simple yet effective way to integrate multiple heterogeneous quantum processors, dynamically adjusting their roles during training to improve the performance of variational quantum algorithms.

Performance Evaluation

Quantum Volume: Quantum volume is a metric that measures the capabilities and error rates of a quantum computer, In 2019, IBM's researchers modified the quantum volume definition to be an exponential of the circuit size, stating that it corresponds to the complexity of simulating the circuit on a classical computer $$ \log_2V_Q=\arg \max_{n<N}{\min[n,\text{d}(n)]} $$

image

Model circuit (random square circuit) 的最大宽度(QV Layer的数量),一个QV Layer由一个随机转置层和一层pair-wise 随机$SU(4)$两比特门组成。

CLOPS:每秒钟能执行的QV Layer数量。

Quantum Arch

HetArch

Microarchitectures for Heterogeneous Superconducting Quantum Computers, MICRO-2023

Key Idea Heterogeneous at pure quantum level. In contrast to homogeneous “sea-of-qubit” architectures, heterogeneous QC architectures differentiate between the various functions that a device may be used for.

Standard Cells

This paper proposes quantum standard cells as the basic modules for quantum computer hardware architecture design.

  1. Register cell: Contains a storage device (e.g. multimode resonator) and a compute device (e.g. transmon) for high capacity storage and input/output management.

  2. Parity Check Cell: Contains two compute devices for reading in data, performing single-qubit and two-qubit gates, and measuring one qubit. Used for parity checks.

  3. Sequential Operations Cell: Contains two Register cells and one compute device with measurement. Optimized for performing many sequential two-qubit gates and parity checks.

  4. Universal Stabilizer Cell: Contains three Register cells and a central compute device with measurement. Used for performing stabilizer code check operations.

These standard cells are designed based on the physical characteristics of quantum devices and optimized for executing basic quantum operations like storage, gating, measurement, etc. These cells can be combined to construct larger modules for implementing subroutines required by quantum algorithms.

TODO

  • How the design space is reduced?
  • Are these standard cells general enough to be extended to other applications?

Readout

Security

QC-TEE

image

Key Idea

The attacker cannot access the refrigerator!

Ideas

No. Description
1 Chiplet Integration (Can we integrate a QPU in a chiplet?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment