Skip to content

Instantly share code, notes, and snippets.

View sappho192's full-sized avatar
👨‍🎓
Graduate student

Taein KIM sappho192

👨‍🎓
Graduate student
View GitHub Profile
@sappho192
sappho192 / infer.py
Created February 26, 2024 08:48
DaramGPT inference example
from transformers import AutoTokenizer, AutoModelForCausalLM, GPTJConfig, GPTJForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("d:/MODEL/DaramGPT")
model = GPTJForCausalLM.from_pretrained("d:/MODEL/DaramGPT")
tokens = tokenizer("서울의 날씨는 ")
print(tokenizer.decode(model.generate(**{
"input_ids": torch.tensor([tokens["input_ids"]]),
@sappho192
sappho192 / NOTES.md
Created January 8, 2024 22:16
Notes on training with Marian MT & Tatoeba dataset

한국어 Windows OS에서 Marian 기반 일본어 → 한국어 번역 모델을 만들면서 겪은 일들을 남겨둡니다.

Tatoeba 데이터셋

내가 썼던 데이터셋에는 각종 보이지않는 유니코드 문자들이 텍스트에 섞여있어서 데이터 전처리와 훈련을 망치게 했었다.
그래서 대충 이런식으로 필요없는 공백과 문제를 일으키는 특문들을 정리했었다.
tp.source = line.rstrip().replace("\u200B", "").replace("\u2028","").replace("\u2029","") 특히 HelsinkiNLP/tatoeba 에서 공개한 데이터셋은 윈도우에서 작업된 텍스트 파일이어서인지 줄바꿈 문자가 리눅스와 다르기 때문에, dos2unix로 줄바꿈 문자를 한번 정리해줘야한다.
그러지 않고 리눅스에서 작업한 다른 데이터셋과 병합하면 문제를 일으킬 수 있으니 주의할 것.

컴파일

@sappho192
sappho192 / infer_onnx.py
Last active January 7, 2024 01:48
Transformers EncoderDecoder language model on Optimum OnnxRuntime
# pip install transformers, optimum, onnx, onnxruntime, fugashi, unidic-lite
from transformers import BertJapaneseTokenizer,PreTrainedTokenizerFast
from optimum.onnxruntime import ORTModelForSeq2SeqLM
encoder_model_name = "cl-tohoku/bert-base-japanese-v2"
decoder_model_name = "skt/kogpt2-base-v2"
# using local tokenizer
# encoder_model_name = "./src_tokenizer"
# decoder_model_name = "./trg_tokenizer"
@sappho192
sappho192 / Thrower.cs
Created May 8, 2023 05:36
Thrower pattern
// from https://forum.dotnetdev.kr/t/c-10-null-check-7/7069/2
public static class Thrower
{
public static Exception ThrowIfFailedValidation<T>(T target, Func<T, bool> validation)
{
if (validation(target) is false)
{
throw new ValidationFailedException();
}
@sappho192
sappho192 / RunMission.java
Last active May 18, 2023 01:42
MAVSDK-Java example (Upload & Run mission)
/* I've changed official RunMission.java in MAVSDK-Java (v1.3.1) example a bit
* because the example is just uploading & downloading mission.
* The same example in C++ SDK demonstrates not only uploading but executing mission so I applied similar logic to this Java example code.
*/
package io.mavsdk.example;
import io.mavsdk.System;
import io.mavsdk.mission.Mission;
import io.mavsdk.telemetry.Telemetry;
@sappho192
sappho192 / MainWindow.xaml
Created May 20, 2020 09:54
[WPF] WPF Webcam & Video player with OpenCVSharp4
<Window x:Class="WebcamCaptureApp.MainWindow"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:local="clr-namespace:WebcamCaptureApp"
mc:Ignorable="d"
Title="MainWindow" Height="900" Width="1600">
<Grid>
<StackPanel Orientation="Vertical">
@sappho192
sappho192 / Program.cs
Created May 20, 2020 07:35
[C#] Webpage crawler with HtmlAgilityPack
using HtmlAgilityPack;
using System;
using System.Net;
namespace WebCrawlerApp
{
class Program
{
static void Main(string[] args)
{
// Extracts Auto Tranlate (상용구) from FFXIV chat data
public static List<byte[]> ExtractAutoTranslate(this byte[] rawMessage)
{
List<byte[]> result = new List<byte[]>();
/*
* \u0002 \002E \u0004 \u0002 \u00F0 \u00CF \u0003
* \u0002 \002E \u0003 \u0002 \u00CA \u0003
* \u0002 \002E \u0005 \u0004 \u00F2 \u0001 \u0095 \u0003
*/
@sappho192
sappho192 / ffxiv.auto.translate.en.cs
Created August 26, 2019 19:35 — forked from 3735943886/ffxiv.auto.translate.en.cs
Ffxiv auto-translation dictionary
/* Version 4.06a */
using System.Collections.Generic;
namespace Ffxiv
{
static public partial class AutoTranslate
{
static public IReadOnlyDictionary<ulong, string> EnDict = new Dictionary<ulong, string>()
{
/* 【Languages】 */
@sappho192
sappho192 / ArrayStack.h
Created April 7, 2017 15:16
[C++] ArrayStack written in template
#pragma once
#ifndef ARRAYSTACK_H
#define ARRAYSTACK_H
#include "StackEmptyException.h"
template <typename Object>
class ArrayStack
{
public: