Skip to content

Instantly share code, notes, and snippets.

View austin362667's full-sized avatar

Austin Liu austin362667

View GitHub Profile
@austin362667
austin362667 / LLM_notes.md
Last active February 29, 2024 06:39
給 Ariel 的小科普

和一位朋友討論 LLM evaluation (成效評估) 和其一些 benchmark (比較基準) datasets 的相關問題

1. LLM 模型評估很難,那模型評估 (evaluation) 是什麼?

  • 因為面對語言類型的任務通常沒有絕對的正確或錯誤,所以我們會設計一些資料來測驗 LLM 的能力,這些 benchmark datasets 只能作為驗證模型能力的某種 proxy。 而且各種資料集有各自專精的領域,類型包羅萬象,諸如:邏輯型、情緒型、翻譯、程式碼、數學解題、常識推理等等族繁不及備載。

  • BBH 基準資料集來說:

    • 文字輸入: False or not ( True ) and False is
  • 我們會期望模型文字輸出: False

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
def sim(ta, tb):
a_t = enc.encode(ta)
b_t = enc.encode(tb)
a_te = np.array(model.transformer.wte.weight[a_t].tolist())
b_te = np.array(model.transformer.wte.weight[b_t].tolist())
a_pe = np.array(model.transformer.wpe.weight[[i for i in range(len(a_t))]].tolist())
b_pe = np.array(model.transformer.wpe.weight[[i for i in range(len(b_t))]].tolist())
x = np.add.reduce(a_te+a_pe)
y = np.add.reduce(b_te+b_pe)
openapi: 3.0.0
info:
title: WolframAlpha API
version: 1.0.0
servers:
- url: https://api.wolframalpha.com/v1
paths:
/result:
get:
summary: Returns a simple answer from WolframAlpha
// Copyright 2023 Optiver Asia Pacific Pty. Ltd.
//
// This file is part of Ready Trader Go.
//
// Ready Trader Go is free software: you can redistribute it and/or
// modify it under the terms of the GNU Affero General Public License
// as published by the Free Software Foundation, either version 3 of
// the License, or (at your option) any later version.
//
// Ready Trader Go is distributed in the hope that it will be useful,
@austin362667
austin362667 / READ_ONCE-and-WRITE_ONCEㄎ.md
Created January 15, 2023 17:37
Why kernel code should use READ_ONCE and WRITE_ONCE for shared memory accesses
@austin362667
austin362667 / parachain_fee_structure.md
Created December 19, 2022 10:20
Polkadot Parachain custom fee structure

The transaction-payment pallet. The functions withdraw_fee and correct_and_deposit_fee of its CurrencyAdapter handle the fees. These fees are then handed over to the OnUnbalanced handler. This handler is an injected trait function and can be configured by the runtime. You see how this works when analyzing the on_unbalanceds function in the common Polkadot runtime. It looks like this:

fn on_unbalanceds<B>(mut fees_then_tips: impl Iterator<Item = NegativeImbalance<R>>) {
    if let Some(fees) = fees_then_tips.next() {
        // for fees, 80% to treasury, 20% to author
@austin362667
austin362667 / discussion.md
Last active December 14, 2022 11:21
random discussion on Processor Performance(critical path & regs size) & Pipelining

Processor Performance

The critical path latencies for the 7 major blocks in a simple processor are given below.

CPU IMem Add Mux ALU Regs DMem Control
a 400ps 100ps 30ps 120ps 200ps 350ps 100ps
b 500ps 150ps 100ps 180ps 220ps 1000ps 65ps

For each part, answer the following questions:

@austin362667
austin362667 / transformer.md
Last active December 13, 2022 10:16
Transoformer QA

Large-Scale Pretraining with Transformers

更 high level 來看, BERT 是用了 Transformer 的 encoder; GPT 則是用了 Transformer 的 decoder.

  1. 注意力機制中 Q, K, V 意義上是什麼, 是如何產生的?

    想像一個場景:一張白色的桌子上有一張白色的紙和一顆紅色的蘋果.

    這時, Values 就是整個場景. Keys 就是你不經意就會注意到的明顯物體(e.g., 紅色蘋果).

@austin362667
austin362667 / NTU_106_HW_Q1.md
Last active December 9, 2022 08:03
ILP Benchmarking + Cache

截圖 2022-12-09 上午9 14 24

5-Stage Pipelined MIPS CPU,
L1:
  I-Cache: 256KB, D-Cache: 32KB
MEM Latency: 200ps(raiesd up to 250ps),
Base CPI(ideal cache) = 1,
Miss Penalty 200 cycles, I-Cache Miss Rate: 5%, D-Cache Miss Rate: 10%(reduce down to 5%).