Skip to content

Instantly share code, notes, and snippets.

View jskim0406's full-sized avatar
๐ŸŽฏ
Focusing

jskim0406

๐ŸŽฏ
Focusing
View GitHub Profile
@jskim0406
jskim0406 / env.md
Last active December 12, 2024 05:30
[klue-sts-final2]
import os
from google.colab import drive
drive.mount('/drive')
os.chdir("/drive/MyDrive/KSOE/AI_test/2024/study/")
print(os.getcwd())

requirements.txt

@jskim0406
jskim0406 / eda.py
Last active December 12, 2024 04:03
[klue-sts-final1]
import random
from konlpy.tag import Mecab
from typing import List, Set
class EDA:
"""
EDA ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•ํ•˜๋Š” ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค.
๋™์˜์–ด ๋Œ€์ฒด ๊ธฐ๋Šฅ์„ ์ œ์™ธํ•œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.
Args:
@jskim0406
jskim0406 / eda.py
Last active December 12, 2024 01:12
[module-KLUE-STS]
import random
from konlpy.tag import Mecab
from nltk.corpus import wordnet
from sentence_transformers import SentenceTransformer, util
from typing import List, Set
class EDA:
"""
EDA ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•ํ•˜๋Š” ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค.
@jskim0406
jskim0406 / data_utils.py
Created December 11, 2024 07:27
[refactor-custom-KLUE-STS]
import json
import random
import re
import torch
from torch.utils.data import Dataset
from konlpy.tag import Mecab
from sentence_transformers import SentenceTransformer, util
from string import punctuation
from soynlp.normalizer import repeat_normalize
from pykospacing import Spacing
@jskim0406
jskim0406 / sample_custom.md
Last active December 11, 2024 06:08
[custom-KLUE-STS]

KLUE STS ํŠœํ† ๋ฆฌ์–ผ ๋ณด์™„: Loss Function ์ •์˜ ๋ฐ Pretrained Model์„ ์‚ฌ์šฉํ•œ STS Model ๊ตฌํ˜„

์ด ํŠœํ† ๋ฆฌ์–ผ์€ ๊ธฐ์กด KLUE STS ํŠœํ† ๋ฆฌ์–ผ์— ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๊ฐœ์„  ์‚ฌํ•ญ์„ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค.

  1. STS Task ์ˆ˜ํ–‰์„ ์œ„ํ•œ Loss Function ์ •์˜: PyTorch๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ STS task์— ์ ํ•ฉํ•œ loss function์„ ์ง์ ‘ ์ •์˜ํ•˜๊ณ  ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.
  2. Pretrained Model์„ ์‚ฌ์šฉํ•œ STS Model ๊ตฌํ˜„: Hugging Face Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ์ œ๊ณตํ•˜๋Š” pretrained model์„ ๊ธฐ๋ฐ˜์œผ๋กœ STS model์„ ๊ตฌํ˜„ํ•˜๊ณ , low-level PyTorch ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์œ ์—ฐ์„ฑ๊ณผ ์ œ์–ด๋ ฅ์„ ๋†’์ž…๋‹ˆ๋‹ค.

1. Loss Function ์ •์˜

STS task๋Š” ๋‘ ๋ฌธ์žฅ ๊ฐ„์˜ ์˜๋ฏธ์  ์œ ์‚ฌ๋„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” regression ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์˜ˆ์ธก๋œ ์œ ์‚ฌ๋„์™€ ์‹ค์ œ ์œ ์‚ฌ๋„ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Mean Squared Error (MSE) loss function์„ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

@jskim0406
jskim0406 / sample.md
Last active December 11, 2024 05:50
[KLUE-STS]

KLUE STS ๋ฐ์ดํ„ฐ์…‹๊ณผ DeBERTa๋ฅผ ํ™œ์šฉํ•œ ํ•œ๊ตญ์–ด ์˜๋ฏธ ์œ ์‚ฌ๋„(STS) ๋ชจ๋ธ ํ•™์Šต ํŠœํ† ๋ฆฌ์–ผ (๊ฐœ์„ ํŒ)

๊ฐœ์š”

๋ณธ ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” huggingface datasets์—์„œ ์ œ๊ณตํ•˜๋Š” KLUE STS ๋ฐ์ดํ„ฐ์…‹์„ ๋กœ๋“œํ•˜๊ณ , ํ–ฅ์ƒ๋œ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•, EDA (Easy Data Augmentation)๋ฅผ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•, KoNLPy์˜ Mecab์„ ์‚ฌ์šฉํ•œ ํ† ํฐํ™”, Microsoft์˜ DeBERTa ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๋Š” ์ „์ฒด ๊ณผ์ •์„ ๋‹จ๊ณ„๋ณ„๋กœ ์ƒ์„ธํ•˜๊ฒŒ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ๋Š” URL, ์ด๋ฉ”์ผ, ์ „ํ™”๋ฒˆํ˜ธ, ์ˆซ์ž ๋“ฑ์„ ํŠน์ˆ˜ ํ† ํฐ์œผ๋กœ ์น˜ํ™˜ํ•˜๊ณ , HTML ํƒœ๊ทธ์™€ ์ด๋ชจ์ง€๋ฅผ ์ œ๊ฑฐํ•˜๋ฉฐ, ๋ฐ˜๋ณต๋˜๋Š” ๋ฌธ์ž๋ฅผ ์ •๊ทœํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, Py-Hanspell๊ณผ PyKoSpacing์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งž์ถค๋ฒ• ๋ฐ ๋„์–ด์“ฐ๊ธฐ ๊ต์ •์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์„ ๋†’์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ณผ์ •์—์„œ๋Š” Sentence-BERT๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ƒ์„ฑ๋œ ๋ฌธ์žฅ๊ณผ ์›๋ณธ ๋ฌธ์žฅ ๊ฐ„์˜ ์˜๋ฏธ์  ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜์—ฌ ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ฐ€์ง„ ๋ฌธ์žฅ๋งŒ ์ฆ๊ฐ•์— ํ™œ์šฉํ•˜๋„๋ก ์ œ์–ดํ•˜๊ณ , GPU/CPU ์‚ฌ์šฉ์„ ์˜ต์…˜์œผ๋กœ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ ํšจ์œจ์ ์ธ ์‹คํ—˜ ํ™˜๊ฒฝ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

1. ํ™˜๊ฒฝ ์„ค์ •

1.1. ํ•„์š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

Q

ํ˜„์žฌ local์— "product_specialist_241118" ์ด๋ผ๋Š” docker image๋ฅผ ๊ฐ–๊ณ  ์žˆ์–ด. ์ด docker image๋ฅผ Google Cloud artifact registry(PATH="asia-northeast3-docker.pkg.dev/pjt-dev-hcecbt-chat/ar-dev-hcecbt-chat-specialist")์— ์—…๋กœ๋“œํ•˜๋Š” CLI๋Š”?

Google Cloud Artifact Registry์— Docker ์ด๋ฏธ์ง€๋ฅผ ์—…๋กœ๋“œํ•˜๊ธฐ ์œ„ํ•œ CLI ๋ช…๋ น์–ด๋“ค์„ ์ˆœ์„œ๋Œ€๋กœ ์ž‘์„ฑํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

# 1. Google Cloud ์ธ์ฆ ์„ค์ • (์ฒ˜์Œ ํ•œ๋ฒˆ๋งŒ ์‹คํ–‰)
gcloud auth configure-docker asia-northeast3-docker.pkg.dev
@jskim0406
jskim0406 / re.md
Last active May 5, 2023 05:02
[re package]
  1. Meta characters(๋ฉ”ํƒ€ ๋ฌธ์ž): ๋ณธ๋ž˜ ๊ทธ ๋ฌธ์ž๊ฐ€ ๊ฐ€์ง„ ๋œป์ด ์•„๋‹Œ ๋‹ค๋ฅธ ์šฉ๋„๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฌธ์ž
    • ์ •๊ทœํ‘œํ˜„์‹์— ์•„๋ž˜ ๋ฉ”ํƒ€๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ํŠน๋ณ„ํ•œ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•จ
.
^
$
*
+
?
{ }
@jskim0406
jskim0406 / shortcut.md
Last active April 25, 2023 00:23
[tmux] short-cut
# ์ƒˆ tmux ์„ธ์…˜ ์ƒ์„ฑํ•˜๊ณ  ํ•ด๋‹น ์„ธ์…˜์œผ๋กœ ์ด๋™ํ•˜๊ธฐ
tmux new-session -s <์„ธ์…˜์ด๋ฆ„>
  • ctrl+b ํ‚ค๋ฅผ ๋ˆ„๋ฅธ ํ›„ ์†์„ ๋–ผ์—ˆ๋‹ค๊ฐ€ %๋ฅผ ์ž…๋ ฅ(shift+5)ํ•˜๋ฉด ๊ฐ€๋กœ๋กœ ํ„ฐ๋ฏธ๋„ ํ™”๋ฉด์„ ๋ถ„ํ• 
  • ctrl+b ํ‚ค๋ฅผ ๋ˆ„๋ฅธ ํ›„ ์†์„ ๋–ผ์—ˆ๋‹ค๊ฐ€ "๋ฅผ ์ž…๋ ฅ(shift+')ํ•˜๋ฉด ์„ธ๋กœ๋กœ ํ„ฐ๋ฏธ๋„ ํ™”๋ฉด์„ ๋ถ„ํ• 
  • ๋ถ„ํ• ๋œ ์ฐฝ ๋ผ๋ฆฌ ์ปค์„œ ์ „ํ™˜์„ ํ•˜๋ ค๋ฉด ctrl+b ํ‚ค๋ฅผ ๋ˆ„๋ฅธ ํ›„ ์†์„ ๋—€ ํ›„ ์ด๋™ํ•˜๋ ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ฐฉํ–ฅํ‚ค
# ํ˜„์žฌ ๋™์ž‘์ค‘์ธ ์„ธ์…˜ ๋ชฉ๋ก ํ™•์ธ
@jskim0406
jskim0406 / alphayut.md
Created March 15, 2023 10:50
[Algorithms - simulation]
[๋ฌธ์ œ ํ•ด๊ฒฐ์˜ ํŒจํ„ด]
* ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ’€์ด๋ฅผ ์ž‘์€ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆˆ๋‹ค.

1. "์ด ๋ฌธ์ œ์—์„œ ์–ด๋–ค ์ž‘์—…์„ ์š”๊ตฌํ•˜๊ณ  ์žˆ๋Š” ๊ฐ€" ํŒŒ์•…, ๋‚˜์—ด
2.